Using decision trees for regression
Decision trees for regression are very similar to decision trees for classification. The procedure for developing a regression model consists of four parts:
- Load the dataset
- Split the set into training/testing subsets
- Instantiate a decision tree regressor and train it
- Score the model on the test subset
Getting ready
For this example, load scikit-learn's diabetes dataset:
#Use within an Jupyter notebook %matplotlib inline import numpy as np import pandas as pd import matplotlib.pyplot as plt from sklearn.datasets import load_diabetes diabetes = load_diabetes() X = diabetes.data y = diabetes.target X_feature_names = ['age', 'gender', 'body mass index', 'average blood pressure','bl_0','bl_1','bl_2','bl_3','bl_4','bl_5']
Now that we have loaded the dataset, we must split the data into training and testing subsets. Before doing that, however, visualize the target variable using pandas:
pd.Series(y).hist(bins=50)

This is a regression example, and we cannot use...