Creating binary features through thresholding
In the last recipe, we looked at transforming our data into the standard normal distribution. Now, we'll talk about another transformation, one that is quite different. Instead of working with the distribution to standardize it, we'll purposely throw away data; if we have good reason, this can be a very smart move. Often, in what is ostensibly continuous data, there are discontinuities that can be determined via binary features.
Additionally, note that in the previous chapter, we turned a classification problem into a regression problem. With thresholding, we can turn a regression problem into a classification problem. This happens in some data science contexts.
Getting ready
Creating binary features and outcomes is a very useful method, but it should be used with caution. Let's use the Boston dataset to learn how to turn values into binary outcomes. First, load the Boston dataset:
import numpy as np from sklearn.datasets import load_boston boston...