Reducing dimensionality with PCA
Now it's time to take the math up a level! PCA is the first somewhat advanced technique discussed in this book. While everything else thus far has been simple statistics, PCA will combine statistics and linear algebra to produce a preprocessing step that can help to reduce dimensionality, which can be the enemy of a simple model.
Getting ready
PCA is a member of the decomposition module of scikit-learn. There are several other decomposition methods available, which will be covered later in this recipe. Let's use the iris dataset, but it's better if you use your own data:
from sklearn import datasets import matplotlib.pyplot as plt import numpy as np import pandas as pd %matplotlib inline iris = datasets.load_iris() iris_X = iris.data y = iris.target
How to do it...
- Import the
decomposition
module:
from sklearn import decomposition
- Instantiate a default PCA object:
pca = decomposition.PCA() pca PCA(copy=True, iterated_power='auto', n_components=None, random_state...