January 2016
Intermediate to advanced
416 pages
8h 54m
English
In the previous example, we validated our approach by withholding 30% of the data when training, and testing on this subset. This approach is not particularly rigorous: the exact result changes depending on the random train-test split. Furthermore, if we wanted to test several different hyperparameters (or different models) to choose the best one, we would, unwittingly, choose the model that best reflects the specific rows in our test set, rather than the population as a whole.
This can be overcome with cross-validation. We have already encountered cross-validation in Chapter 4, Parallel Collections and Futures. In that chapter, we used random subsample cross-validation, where we created the train-test split ...
Read now
Unlock full access