November 2018
Intermediate to advanced
300 pages
7h 42m
English
Cross-validation splits the dataset into k sets of approximately the same size—for example, in the following diagram, into five sets. First, we use sets 2 to 5 for learning and set 1 for training. We then repeat the procedure five times, leaving out one set at a time for testing, and average the error over the five repetitions:

This way, we use all of the data for learning and testing as well, while avoiding using the same data to train and test a model.