Clustering metrics
Measuring the performance of a clustering algorithm is a little trickier than classification or regression, because clustering is unsupervised machine learning. Thankfully, scikit-learn comes equipped to help us with this as well in a very straightforward manner.
Getting ready
To measure clustering performance, start by loading the iris dataset. We will relabel the iris flowers as two types: type 0 is whenever the target is 0 and type 1 is when the target is 1 or 2:
from sklearn.datasets import load_iris import numpy as np iris = load_iris() X = iris.data y = np.where(iris.target == 0,0,1)
How to do it...
- Instantiate a k-means algorithm and train it. Since the algorithm is a clustering one, do not use the target in the training:
from sklearn.cluster import KMeans kmeans = KMeans(n_clusters=2,random_state=0) kmeans.fit(X)
- Now import everything necessary to score k-means through cross-validation. We will use the
adjusted_rand_score
clustering performance metric:
from sklearn.metrics...