Determining number of clusters
The beauty of clustering algorithms like K-means algorithm is that it does the clustering on the data with an unlimited number of features. It is a great tool to use when you have a raw data and would like to know the patterns in that data. However, deciding the number of clusters prior to doing the experiment might not be successful but may sometimes lead to an overfitting or underfitting problem. On the other hand, one common thing to all three algorithms (that is, K-means, bisecting K-means, and Gaussian mixture) is that the number of must be determined in advance and supplied to the algorithm as a parameter. Hence, informally, determining the number of clusters is a separate optimization problem to be solved.
In this section, we will use a heuristic approach based on the Elbow method. We start from K = 2 clusters, and then we ran the K-means algorithm for the same data set by increasing K and observing the value of cost function Within-Cluster Sum of Squares...