A more complex dataset and the nearest-neighbor classifier
We will now look at a slightly more complex dataset. This will include the introduction of a new classification algorithm and a few other ideas.
Learning about the seeds dataset
We now look at another agricultural dataset, which is still small, but already too large to plot exhaustively on a page as we did with the Iris dataset. This dataset consists of measurements of wheat seeds. There are seven features that are present, which are as follows:
- Area A
- Perimeter P
- Compactness C = 4πA/P²
- Length of kernel
- Width of kernel
- Asymmetry coefficient
- Length of kernel groove
There are three classes corresponding to three wheat varieties: Canadian, Koma, and Rosa. As earlier, the goal is to be able to classify the species based on these morphological measurements. Unlike the Iris dataset, which was collected in the 1930s, this is a very recent dataset and its features were automatically computed from digital images.
This is how image pattern recognition...