Implementing random forest regression
Random forests is an ensemble algorithm. Ensemble algorithms use several algorithms together to improve predictions. Scikit-learn has several ensemble algorithms, most of which use trees to predict. Let's start by expanding on decision tree regression with several decision trees working together in a random forest.
A random forest is a mixture of several decision trees, where each tree provides a single vote toward the final prediction. The final random forest calculates a final output by averaging the results of all the trees it is composed of.
Getting ready
Load the diabetes regression dataset as we did with decision trees. Split all of the data into training and testing sets:
%matplotlib inline import numpy as np import pandas as pd import matplotlib.pyplot as plt from sklearn.datasets import load_diabetes diabetes = load_diabetes() X = diabetes.data y = diabetes.target X_feature_names = ['age', 'gender', 'body mass index', 'average blood pressure...