Understanding k-means clustering
The most essential clustering algorithm that OpenCV provides is k-means clustering, which searches for a predetermined number of k clusters (or groups) within an unlabeled multidimensional dataset.
It does so by using two simple assumptions about what an optimal clustering should look like:
- The center of each cluster is simply the arithmetic mean of all the points belonging to the cluster
- Each point in the cluster is closer to its own center than to other cluster centers
It's the easiest to understand the algorithm by looking at a concrete example.
Implementing our first k-means example
First, let's generate a 2D dataset containing four distinct blobs. To emphasize that this is an unsupervised approach, we will leave the labels out of the visualization. We will continue using matplotlib
for all our visualization purposes:
In [1]: import matplotlib.pyplot as plt ... %matplotlib inline ... plt.style.use('ggplot')
Following the same recipe from earlier chapters...