Cross-validation is an important topic that we introduced in Chapter 2, Declaring the Objectives, and again we recall that it IS a very important step in building any predictive model.

While there are many different kinds of cross-validation methods, the basic idea is a data scientist repeating the following process a number of times:

Train me, test me, split me:

  1. Do the train-test split.
  2. Fit the model to the train set.
  3. Test the model on the test set.
  4. Calculate and review the prediction error.
  5. Repeat (n number of times).

By conducting the preceding process a number of times, the data scientist will then be able to calculate the average error that is then used to assess how the statistical model is performing (performance ...

Get Statistics for Data Science now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.