Tuning gradient boosting trees
We will examine the California housing dataset with gradient boosting trees. Our overall approach will be the same as before:
- Focus on important parameters in the gradient boosting algorithm:
max_features
max_depth
min_samples_leaf
learning_rate
loss
- Create a parameter distribution where the most important parameters are varied.
- Perform a random grid search. If using an ensemble, keep the number of estimators low at first.
- Use the best parameters from the previous step with many estimators.
Getting ready
Load the California housing dataset and split the loaded dataset into training and testing sets:
%matplotlib inline from __future__ import division #Load within Python 2.7 for regular division import numpy as np import pandas as pd import matplotlib.pyplot as plt from sklearn.datasets import fetch_california_housing cali_housing = fetch_california_housing() X = cali_housing.data y = cali_housing.target #bin output variable to split training and testing sets into...