Given that the goal of predictive analytics is to build generalizable models that predict well for data yet unobserved, we should ideally be testing our models on data unseen, and check our predictions against the observed outcomes. The problem with that, of course, is that we don't know the outcomes of data unseen—that's why we want a predictive model. We do, however, have a trick up our sleeve, called the validation set approach.
The validation set approach is a technique to evaluate a model's ability to perform well on an independent dataset. But instead of waiting to get our hands on a completely new dataset, we simulate a new dataset with the one we already have.
The main idea is that we can split our dataset into two ...