Understanding cross-validation

Cross-validation is a method of evaluating the generalization performance of a model that is generally more stable and thorough than splitting the dataset into training and test sets.

The most commonly used version of cross-validation is k-fold cross-validation, where k is a number specified by the user (usually five or ten). Here, the dataset is partitioned into k parts of more or less equal size, called folds. For a dataset that contains N data points, each fold should thus have approximately N / k samples. Then a series of models is trained on the data, using k - 1 folds for training and one remaining fold for testing. The procedure is repeated for k iterations, each time choosing a different fold for testing, ...

Get Machine Learning for OpenCV now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.