December 2018
Beginner to intermediate
684 pages
21h 9m
English
All libraries have their own data format to precompute feature statistics to accelerate the search for split points, as described previously. These can also be persisted to accelerate the start of subsequent training.
The following code constructs binary train and validation datasets for each model to be used with the OneStepTimeSeriesSplit:
cat_cols = ['year', 'month', 'age', 'msize', 'sector']data = {}for fold, (train_idx, test_idx) in enumerate(kfold.split(features)): print(fold, end=' ', flush=True) if model == 'xgboost': data[fold] = {'train': xgb.DMatrix(label=target.iloc[train_idx], data=features.iloc[train_idx], nthread=-1), # use avail. threads 'valid': xgb.DMatrix(label=target.iloc[test_idx],