Evaluating clustering techniques
As we are not trying to predict a number or category, our previously discussed evaluation metrics for continuous and discrete variables do not really apply to clustering techniques. That does not mean that we will just avoid measuring the performance of clustering algorithms. We need to know how well our clustering is performing. We just need to introduce a few clustering-specific evaluation metrics.
Internal clustering evaluation
If we do not have a gold standard set of labels for our clusters for comparison, we are stuck with evaluating how well our clustering technique performs using internal criteria. In other words, we can still evaluate our clustering by making similarity and dissimilarity measurements within the clusters themselves.
The first of these internal metrics that we will present here is called the silhouette coefficient. The silhouette coefficient can be calculated for each clustered data point as follows:

Here, a is the mean distance between...