Cross-validation with ShuffleSplit
The ShuffleSplit is one of the simplest cross-validation techniques. Using this cross-validation technique will simply take a sample of the data for the number of iterations specified.
Getting ready
The ShuffleSplit is a simple validation technique. We'll specify the total elements in the dataset, and it will take care of the rest. We'll walk through an example of estimating the mean of a univariate dataset. This is similar to resampling, but it'll illustrate why we want to use cross-validation while showing cross-validation.
How to do it...
- First, we need to create the dataset. We'll use NumPy to create a dataset in which we know the underlying mean. We'll sample half of the dataset to estimate the mean and see how close it is to the underlying mean. Generate a normally distributed random sample with a mean of 1,000 and a scale (standard deviation) of 10:
%matplotlib inline import numpy as np true_mean = 1000 true_std = 10 N = 1000 dataset = np.random.normal...