Chapter 11REGRESSION MODELING
11.1 THE ESTIMATION TASK
Thus far in the Modeling Phase we have covered the following tasks:
- Classification task
- Clustering task
There remain two tasks left to cover:
- Estimation task
- Association task
In this chapter, we cover the estimation task; later, in Chapter 14, we will cover the association task.
The most widespread method for performing the estimation task is linear regression. Simple linear regression approximates the relationship between a numeric predictor and a continuous target, using a straight line. Multiple regression modeling approximates the relationship between a set of p > 1 predictors and a single continuous target, using a p‐dimensional plane or hyperplane.
11.2 DESCRIPTIVE REGRESSION MODELING
The usual multiple regression model is a parametric model, defined by the following equation:
where the x's represent the predictor variables, and the β's represent the unknown model parameters, whose values are estimated using the data.1 Now, estimating model parameters using sample data represents classical statistical inference. The Data Science Methodology outlined in Chapter 1, however, employs cross‐validation rather than classical statistical inference to validate model results. Thus, in this book, we will bypass the parametric regression equation above, in favor of a descriptive approach to regression modeling, using the ...