Cross-validation is an important topic that we introduced in Chapter 2, Declaring the Objectives, and again we recall that it IS a very important step in building any predictive model.
While there are many different kinds of cross-validation methods, the basic idea is a data scientist repeating the following process a number of times:
Train me, test me, split me:
- Do the train-test split.
- Fit the model to the train set.
- Test the model on the test set.
- Calculate and review the prediction error.
- Repeat (n number of times).
By conducting the preceding process a number of times, the data scientist will then be able to calculate the average error that is then used to assess how the statistical model is performing (performance ...