O'Reilly logo

Effective Amazon Machine Learning by Alexis Perrier

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

What is cross-validation?

To lower the dependence on the data distribution in each split, the idea is to run many trials in parallel, each with a different data split, and average the results. This is called cross-validation.

The idea is simply to average the model performance across K trials, where each trial is built on a different split of the original dataset. There are many strategies to split the dataset. The most common one is called k-fold cross-validation and consists of splitting the dataset into K chunks, and for each trial using K-1 chunks aggregated to train the model and the remaining chunk to evaluate it. Another strategy, called leave-one-out (LOO), comes from taking this idea to its extreme with K as the number of samples. ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required