Time series cross-validation
scikit-learn can perform cross-validation for time series data such as stock market data. We will do so with a time series split, as we would like the model to predict the future, not have an information data leak from the future.
Getting ready
We will create the indices for a time series split. Start by creating a small toy dataset:
from sklearn.model_selection import TimeSeriesSplit import numpy as np X = np.array([[1, 2], [3, 4], [1, 2], [3, 4],[1, 2], [3, 4], [1, 2], [3, 4]]) y = np.array([1, 2, 3, 4, 1, 2, 3, 4])
How to do it...
- Now create a time series split object:
tscv = TimeSeriesSplit(n_splits=7)
- Iterate through it:
for train_index, test_index in tscv.split(X): X_train, X_test = X[train_index], X[test_index] y_train, y_test = y[train_index], y[test_index] print "Training indices:", train_index, "Testing indices:", test_index Training indices: [0] Testing indices: [1] Training indices: [0 1] Testing indices: [2] Training indices: [0 1 2...