Agglomerative Clustering
Let's consider the following dataset:

We define affinity, a metric function of two arguments with the same dimensionality, m. The most common metrics (also supported by scikit-learn) are the following:
- Euclidean or L2 (Minkowski distance with p=2):

- Manhattan (also known as city block) or L1 (Minkowski distance with p=1):

- Cosine distance:

The Euclidean distance is normally a good choice, but sometimes it's useful to have a metric whose difference from the Euclidean one gets larger and larger. As discussed in Chapter 9, Clustering Fundamentals, the Manhattan metric has this property. In the following graph, there's a plot representing the distances from the origin of points belonging to the line y = x:

Distances of the point (x, x) from (0, 0) using the Euclidean and Manhattan metrics
The cosine distance is instead useful when we need a distance proportional to the angle between two vectors. If the direction is the same, the distance is null, while it is the maximum when...