Let's perform PCA from scratch in Python:
- Compute the correlation or covariance matrix of a given dataset.
- Find the eigenvalues and eigenvectors of the correlation or covariance matrix.
- Multiply the eigenvector matrix by the original dataset and you will get the principal component matrix.
Let's implement PCA from scratch:
- We will begin by importing libraries and defining the dataset:
# Import numpy
import numpy as np
# Import linear algebra module
from scipy import linalg as la
# Create dataset
data=np.array([[7., 4., 3.],
[4., 1., 8.],
[6., 3., 5.],
[8., 6., 1.],
[8., 5., 7.],
[7., 2., 9.],
[5., 3., 3.],
[9., 5., 8.],
[7., 4., 5.],
[8., 2., 2.]])
- Calculate the covariance matrix:
# Calculate the covariance matrix
# Center your data
data -= data.mean(axis=0)
cov = np.cov(data, rowvar=False)
- Calculate the eigenvalues and eigenvector of the covariance matrix:
# Calculate eigenvalues and eigenvector of the covariance matrix
evals, evecs = la.eig(cov)
- Multiply the original...