Optimizing an SVM
For this example we will continue with the iris dataset, but will use two classes that are harder to tell apart, the Versicolour and Virginica iris species.
In this section we will focus on the following:
- Setting up a scikit-learn pipeline: A chain of transformations with a predictive model at the end
- A grid search: A performance scan of several versions of SVMs with varying parameters
Getting ready
Load two classes and two features of the iris dataset:
#load the libraries we have been using import numpy as np import pandas as pd import matplotlib.pyplot as plt from sklearn import datasets iris = datasets.load_iris() X_w = iris.data[:, :2] #load the first two features of the iris data y_w = iris.target #load the target of the iris data X = X_w[y_w != 0] y = y_w[y_w != 0] X_1 = X[y == 1] X_2 = X[y == 2]
How to do it...
- Begin by splitting the data into training and testing sets:
from sklearn.model_selection import train_test_split X_train, X_test, y_train, y_test = train_test_split...