Introduction to k-means Clustering
Hopefully, by now, you can see that finding clusters is extremely valuable in a machine learning workflow. However, how can you actually find these clusters? One of the most basic yet popular approaches is by using a cluster analysis called k-means clustering. k-means works by searching for K clusters in your data and the workflow is actually quite intuitive – we will start with the no-math introduction to k-means, followed by an implementation in Python.
No-Math k-means Walkthrough
Here is the no-math algorithm of k-means clustering:
- Pick K centroids (K = expected distinct # of clusters).
- Randomly place K centroids anywhere amongst your existing training data.
- Calculate the Euclidean distance from each centroid to all the points in your training data.
- Training data points get grouped in with their nearest centroid.
- Amongst the data points grouped into each centroid, calculate the mean data point and move your...