Extracting the right features from your data
As the underlying models for regression are the same as those for the classification case, we can use the same approach to create input features. The only practical difference is that the target is now a real-valued variable as opposed to a categorical one. The LabeledPoint
class in ML library already takes this into account, as the label
field is of the Double
type, so it can handle both cases.
Extracting features from the bike sharing dataset
To illustrate the concepts in this chapter, we will be using the bike sharing dataset. This dataset contains hourly records of the number of bicycle rentals in the capital bike sharing system. It also contains variables related to date, time, weather, seasonal, and holiday information.
Note
The dataset is available at http://archive.ics.uci.edu/ml/datasets/Bike+Sharing+Dataset.Click on the Data Folder
link, and then download the Bike-Sharing-Dataset.zip
file.The bike sharing data was enriched with weather and...