Classification metrics
Earlier in the chapter, we explored choosing the best of a few nearest neighbors instances based on the number of neighbors, n_neighbors
, parameter. This is the main parameter in nearest neighbors classification: classify a point based on the label of KNN. So, for 3-nearest neighbors, classify a point based on the label of the three nearest points. Take a majority vote of the three nearest points.
The classification metric in this case was the internal metric accuracy_score
, which is defined as the number of classifications that were correct divided by the total number of classifications. There are alternate metrics, and we will explore them here.
Getting ready
- To start, load the Pima diabetes dataset from the UCI repository:
import pandas as pd data_web_address = "https://archive.ics.uci.edu/ml/machine-learning-databases/pima-indians-diabetes/pima-indians-diabetes.data" column_names = ['pregnancy_x', 'plasma_con', 'blood_pressure', 'skin_mm', 'insulin', 'bmi', 'pedigree_func...