Computing pairwise distances from a dataset, using different distance metrics
This section represents the pairwise distances from a dataset and some if its applications.
How to do it…
To do this, we need to consider the following points:
- It is imperative to have a good set of different distance functions for any of the algorithms that perform the search and SciPy has, for this purpose, a huge collection of optimally coded functions in the
distance
submodule of thescipy.spatial
module. - The list is long. Besides Euclidean, squared Euclidean, or standardized Euclidean, we have many more—Bray-Curtis, Canberra, Chebyshev, Manhattan, correlation distance, cosine distance, dice dissimilarity, Hamming, Jaccard-Needham, Kulsinski, Mahalanobis, and so on.
- The syntax in most cases is simple:
distance_function(first_vector, second_vector)
The only three cases in which the syntax is different are the Minkowski, Mahalanobis, and standardized Euclidean distances, in which the distance function requires either...