Extracting the right features from your data
As with all machine learning models we have explored so far, dimensionality reduction models also operate on a feature vector representation of our data.
For this chapter, we will dive into the world of image processing, using the Labeled Faces in the Wild (LFW) dataset of facial images. This dataset contains over 13,000 images of faces generally taken from the Internet, and belonging to well-known public figures. The faces are labeled with the person's name.
Extracting features from the LFW dataset
In order to avoid having to download and process a very large dataset, we will work with a subset of the images, using people who have names that start with an A. This dataset can be downloaded from http://vis-www.cs.umass.edu/lfw/lfw-a.tgz.
Note
For more details and other variants of the data, visit http://vis-www.cs.umass.edu/lfw/.The original research paper reference is:Gary B. Huang, Manu Ramesh, Tamara Berg, and Erik Learned-Miller. Labeled Faces in...