In this chapter we introduce linear regression models for the purpose of prediction. We discuss the differences between fitting and using regression models for the purpose of inference (as in classical statistics) and for prediction. A predictive goal calls for evaluating model performance on a validation set and for using predictive metrics. We then raise the challenges of using many predictors and describe variable selection algorithms that are often implemented in linear regression procedures.

The most popular model for making predictions is the *multiple linear regression model* encountered in most introductory statistics classes and textbooks. This model is used to fit a linear relationship between a quantitative *dependent variable Y* (also called the *outcome* or *response variable*) and a set of *predictors X*_{1}, *X*_{2}, ...,*X*_{p} (also referred to as *independent variables, input variables, regressors*, or *covariates*). The assumption is that in the population of interest, the following relationship holds:

**Equation 6.1. **

where *β _{0}*, ... ,

The two popular objectives behind fitting a model that relates a quantitative outcome with predictors are for understanding the ...

Start Free Trial

No credit card required