December 2018
Beginner to intermediate
684 pages
21h 9m
English
The time series nature of the data implies that cross-validation produces a situation where data from the future will be used to predict data from the past. This is unrealistic at best and data snooping at worst, to the extent that future data reflects past events.
To address time dependency, the sklearn.model_selection.TimeSeriesSplit object implements a walk-forward test with an expanding training set, where subsequent training sets are supersets of past training sets, as shown in the following code:
tscv = TimeSeriesSplit(n_splits=5)for train, validate in tscv.split(data): print(train, validate)[0 1 2 3 4] [5][0 1 2 3 4 5] [6][0 1 2 3 4 5 6] [7][0 1 2 3 4 5 6 7] [8][0 1 2 3 4 5 6 7 8] [9]
You can ...