Putting it all together with pipelines
Now that we've used pipelines and data transformation techniques, we'll walk through a more complicated example that combines several of the previous recipes into a pipeline.
Getting ready
In this section, we'll show off some more of pipeline's power. When we used it earlier to impute missing values, it was only a quick taste; here, we'll chain together multiple pre-processing steps to show how pipelines can remove extra work. Let's briefly load the iris dataset and seed it with some missing values:
from sklearn.datasets import load_iris from sklearn.datasets import load_iris import numpy as np iris = load_iris() iris_data = iris.data mask = np.random.binomial(1, .25, iris_data.shape).astype(bool) iris_data[mask] = np.nan iris_data[:5] array([[ nan, 3.5, 1.4, 0.2], [ 4.9, 3. , 1.4, nan], [ nan, 3.2, nan, nan], [ nan, nan, 1.5, 0.2], [ nan, 3.6, 1.4, 0.2]])
How to do it...
The goal of this chapter is to first impute...