Chapter 3. Offline Evaluation Mechanisms: Hold-Out Validation, Cross-Validation, and Bootstrapping

Now that we’ve discussed the metrics, let’s re-situate ourselves in the machine learning model workflow that we unveiled in Figure 1-1. We are still in the prototyping phase. This stage is where we tweak everything: features, types of model, training methods, etc. Let’s dive a little deeper into model selection.

Unpacking the Prototyping Phase: Training, Validation, Model Selection

Each time we tweak something, we come up with a new model. Model selection refers to the process of selecting the right model (or type of model) that fits the data. This is done using validation results, not training results. Figure 3-1 gives a simplified view of this mechanism.

Figure 3-1. The prototyping phase of building a machine learning model

In Figure 3-1, hyperparameter tuning is illustrated as a “meta” process that controls the training process. We’ll discuss exactly how it is done in Chapter 4. Take note that the available historical dataset is split into two parts: training and validation. The model training process receives training data and produces a model, which is evaluated on validation data. The results from validation are passed back to the hyperparameter tuner, which tweaks some knobs and trains the model again.

The question is, why must the model be evaluated on two different datasets? ...

Get Evaluating Machine Learning Models now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.