July 2017
Beginner to intermediate
378 pages
10h 26m
English
A popular method of validation is called cross-validation. It involves randomly dividing up the training set into a number of equal subsets of data rows. A subset is referred to as a fold. A model is then trained on all but one of the subsets. The set that it was not trained on is held out as a validation set to check on the error rate for the trained model.
Another model is then trained on all but one of the subsets, but a different one this time. The fold that is held out is used to check on prediction error for the model trained on the other folds. This continues so that each fold is held out exactly once, and all the data is used in both training and validation. The resulting set of models will each run its own prediction ...