As you saw in Chapter 3, “Predictive Model Building: Balancing Performance, Complexity, and Big Data,” getting linear regression to work in practice requires some manipulation of the ordinary least squares algorithm. Ordinary least squares regression cannot temper its use of all the data available in an attempt to minimize the error on the training data. Chapter 3 illustrated that this situation can lead to models that perform much worse on test data than on the training data. Chapter 3 showed two extensions of ordinary least squares regression: forward stepwise regression and ridge regression. Both of these involved judiciously reducing the amount of data available to ordinary least squares and using out-of-sample error measurement to determine how much data resulted in the best performance.
Stepwise regression began by letting ordinary least squares regression use exactly one of the attribute columns for making predictions and by picking the best one. It proceeded by recursively adding a single additional column of attributes to those already being used in the model.
Ridge regression introduced a different type of constraint. Ridge regression imposed a penalty on the magnitude of the coefficients to constrict the solution. Both ridge regression and forward stepwise regression gave better than ordinary least squares (OLS) on example problems.
This chapter develops an extended family of methods for controlling the overfitting inherent in ...