Chapter 4: Dimension Reduction and PCA
Activity 6: Manual PCA versus scikit-learn
Solution
- Import the
pandas
,numpy
, andmatplotlib
plotting libraries and the scikit-learnPCA
model:import pandas as pd import numpy as np import matplotlib.pyplot as plt from sklearn.decomposition import PCA
- Load the dataset and select only the sepal features as per the previous exercises. Display the first five rows of the data:
df = pd.read_csv('iris-data.csv') df = df[['Sepal Length', 'Sepal Width']] df.head()
The output is as follows:
Figure 4.43: The first five rows of the data
- Compute the
covariance
matrix for the data:cov = np.cov(df.values.T) cov
The output is as follows:
Figure 4.44: The covariance matrix for the data
- Transform the data using the scikit-learn API and only the first principal component. Store the transformed data in the
sklearn_pca
variable:model = PCA(n_components=1) sklearn_pca = model.fit_transform(df.values)
- Transform the...