A comparative analysis between clustering algorithms
Gaussian mixture is used mainly for expectation minimization, which is an of an optimization algorithm. Bisecting K-means, which is faster than regular K-means, also produces slightly different clustering results. Below we try to compare these three algorithms. We will show a performance comparison in terms of model building time and the computional cost for each algorithm. As shown in the following code, we can compute the cost in terms of WCSS. The following lines of code can be used to compute the WCSS for the K-means and bisecting algorithms:
val WCSSS = model.computeCost(landRDD) // land RDD is the training set println("Within-Cluster Sum of Squares = " + WCSSS) // Less is better
For the dataset we used throughout this chapter, we got the following values of WCSS:
Within-Cluster Sum of Squares of Bisecting K-means = 2.096980212594632E11 Within-Cluster Sum of Squares of K-means = 1.455560123603583E12
This means that K-means shows slightly...