Using random forests for face recognition
A popular dataset that we haven't talked much about yet is the Olivetti face dataset.
The Olivetti face dataset was collected in 1990 by AT&T Laboratories Cambridge. The dataset comprises facial images of 40 distinct subjects, taken at different times and under different lighting conditions. In addition, subjects varied their facial expression (open/closed eyes, smiling/not smiling) and their facial details (glasses/no glasses).
Images were then quantized to 256 grayscale levels and stored as unsigned 8-bit integers. Because there are 40 distinct subjects, the dataset comes with 40 distinct target labels. Recognizing faces thus constitutes an example of a multiclass classification task.
Loading the dataset
Like many other classic datasets, the Olivetti face dataset can be loaded using scikit-learn:
In [1]: from sklearn.datasets import fetch_olivetti_faces ... dataset = fetch_olivetti_faces() In [2]: X = dataset.data ... y = dataset.target
Although...