K folds cross-validation

K folds cross-validation is a much better estimator of our model's performance, even more so than our train-test split. Here's how it works:

We will take a finite number of equal slices of our data (usually 3, 5, or 10). Assume that this number is called k.

For each "fold" of the cross-validation, we will treat k-1 of the sections as the training set, and the remaining section as our test set.
For the remaining folds, a different arrangement of k-1 sections is considered for our training set and a different section is our training set.
We compute a set metric for each fold of the cross-validation.
We average our scores at the end.

Cross-validation is effectively using multiple train-test splits being done on the same dataset. ...

Get Principles of Data Science - Second Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Principles of Data Science - Second Edition by Sinan Ozdemir, Sunil Kakade