Scaling data to the standard normal distribution
A pre-processing step that is recommended is to scale columns to the standard normal. The standard normal is probably the most important distribution in statistics. If you've ever been introduced to statistics, you must have almost certainly seen z-scores. In truth, that's all this recipe is about—transforming our features from their endowed distribution into z-scores.
Getting ready
The act of scaling data is extremely useful. There are a lot of machine learning algorithms, which perform differently (and incorrectly) in the event the features exist at different scales. For example, SVMs perform poorly if the data isn't scaled because they use a distance function in their optimization, which is biased if one feature varies from 0 to 10,000 and the other varies from 0 to 1.
The preprocessing
module contains several useful functions for scaling features:
from sklearn import preprocessing import numpy as np # we'll need it later
Load the Boston dataset...