December 2018
Beginner to intermediate
684 pages
21h 9m
English
The sklearn.model_selection.KFold iterator produces several disjunct splits and assigns each of these splits once to the validation set, as shown in the following code:
kf = KFold(n_splits=5)for train, validate in kf.split(data): print(train, validate)[2 3 4 5 6 7 8 9] [0 1][0 1 4 5 6 7 8 9] [2 3][0 1 2 3 6 7 8 9] [4 5][0 1 2 3 4 5 8 9] [6 7][0 1 2 3 4 5 6 7] [8 9]
In addition to the number of splits, most CV objects take a shuffle argument that ensures randomization. To render results reproducible, set the random_state, as follows:
kf = KFold(n_splits=5, shuffle=True, random_state=42)for train, validate in kf.split(data): print(train, validate)[0 2 3 4 5 6 7 9] [1 8][1 2 3 4 6 7 8 9] [0 5][0 1 3 4 5 6 8 9] [2 7][0 1 2 3 5 ...