Chapter 3: Neighborhood Approaches and DBSCAN
Activity 4: Implement DBSCAN from Scratch
Solution:
- Generate a random cluster dataset as follows:
from sklearn.cluster import DBSCAN from sklearn.datasets import make_blobs import matplotlib.pyplot as plt import numpy as np %matplotlib inline X_blob, y_blob = make_blobs(n_samples=500, centers=4, n_features=2, random_state=800)
- Visualize the generated data:
plt.scatter(X_blob[:,0], X_blob[:,1]) plt.show()
The output is as follows:
Figure 3.14: Plot of generated data
- Create functions from scratch that allow you to call DBSCAN on a dataset:
def scratch_DBSCAN(x, eps, min_pts): """ param x (list of vectors): your dataset to be clustered param eps (float): neigborhood radius threshold param min_pts (int): minimum number of points threshold for a nieghborhood to be a cluster """ # Build a label holder that is comprised of all 0s labels = [0]* x.shape[0] # Arbitrary starting...