Selecting a model with cross-validation
We saw automatic cross-validation, the cross_val_score
function, in Chapter 1, High-Performance Machine Learning – NumPy. This will be very similar, except we will use the last two columns of the iris dataset as the data. The purpose of this section is to select the best model we can.
Before starting, we will define the best model as the one that scores the highest. If there happens to be a tie, we will choose the model that has the best score with the least volatility.
Getting ready
In this recipe we will do the following:
- Load the last two features (columns) of the iris dataset
- Split the data into training and testing data
- Instantiate two k-nearest neighbors (KNN) algorithms, with three and five neighbors
- Score both algorithms
- Select the model that scores the best
Start by loading the dataset:
from sklearn import datasets iris = datasets.load_iris() X = iris.data[:,2:] y = iris.target
Split the data into training and testing. The samples are stratified, the...