Hierarchical clustering (HC)
In this section, we discuss the clustering technique and its computational challenges. An example of using the bisecting K-means algorithm of hierarchical clustering with Spark MLlib will be shown too for a better understanding of hierarchical clustering.
An overview of HC algorithm and challenges
A hierarchical clustering technique is computationally from the centroid-based clustering in the way the distances are computed. This is one of the most popular and widely used clustering analysis that looks to build a hierarchy of clusters. Since a cluster usually consists of multiple objects, there will be other candidates to compute the distance too. Therefore, with the exception of the usual choice of distance functions, you also need to decide on the linkage criterion to be used. In short, there are two types of strategies in clustering:
- Bottom-up approach: In this approach, each observation starts within its own cluster. After that, the pairs of clusters are...