K-fold cross validation
In the quest to find the best model, you can view the indices of cross-validation folds and see what data is in each fold.
Getting ready
Create a toy dataset that is very small:
import numpy as np X = np.array([[1, 2], [3, 4], [5, 6], [7, 8],[1, 2], [3, 4], [5, 6], [7, 8]]) y = np.array([1, 2, 1, 2, 1, 2, 1, 2])
How to do it..
- Import
KFold
and select the number of splits:
from sklearn.model_selection import KFold kf= KFold(n_splits = 4)
- You can iterate through the generator and print out the indices:
cc = 1 for train_index, test_index in kf.split(X): print "Round : ",cc,": ", print "Training indices :", train_index, print "Testing indices :", test_index cc += 1 Round 1 : Training indices : [2 3 4 5 6 7] Testing indices : [0 1] Round 2 : Training indices : [0 1 4 5 6 7] Testing indices : [2 3] Round 3 : Training indices : [0 1 2 3 6 7] Testing indices : [4 5] Round 4 : Training indices : [0 1 2 3 4 5] Testing indices : [6 7]
You can see, for example...