Overfitting and cross validation
If you remember from Chapters 2, 3, and 4, one of the problems with our methodology when building models was that we were guilty of overfitting. Overfitting, the bane of predictive analytics, is what happens when we build a model that does a great job with past data but then falls apart when new data is introduced. This phenomenon is not just for data science; it happens a lot in our society: Professional athletes get lucrative contracts and then fail to live up to their prior performances; fund managers get hefty salary bumps because of last year's performance, and the list goes on.
Cross validation – train versus test
Unlike the Yankees, who never seem to learn, our profession has learned from its mistakes and ...