In high-dimensional settings, where you have many possible signals that can be included in your model, you need to be careful to select the best model for predicting future data and avoid overfit. To do this, you first use recipes that provide a good array of candidate models. You then select among these candidates to minimize estimates for the error rate when predicting on new data. This chapter introduces the key tools for such high-dimensional modeling.

Out-of-Sample Performance

In the previous chapter, we introduced deviance as a measure of how tightly your model fits the training data. When you apply your models for prediction ...

Get Business Data Science: Combining Machine Learning and Economics to Optimize, Automate, and Accelerate Business Decisions now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.