Chapter 8Simple Linear Regression
Regression modeling represents a powerful and elegant method for estimating the value of a continuous target variable. In this chapter, we introduce regression modeling through simple linear regression, where a straight line is used to approximate the relationship between a single continuous predictor variable and a single continuous response variable. Later, in Chapter 9, we turn to multiple regression, where several predictor variables are used to estimate a single response.
8.1 An Example of Simple Linear Regression
To develop the simple linear regression model, consider the Cereals data set,1 an excerpt of which is presented in Table 8.1. The Cereals data set contains nutritional information for 77 breakfast cereals, and includes the following variables:
- Cereal name
- Cereal manufacturer
- Type (hot or cold)
- Calories per serving
- Grams of protein
- Grams of fat
- Milligrams of sodium
- Grams of fiber
- Grams of carbohydrates
- Grams of sugar
- Milligrams of potassium
- Percentage of recommended daily allowance of vitamins (0%, 25%, or 100%)
- Weight of one serving
- Number of cups per serving
- Shelf location (1 = bottom, 2 = middle, 3 = top)
- Nutritional rating, as calculated by Consumer Reports.
Table 8.1 Excerpt from Cereals data set: eight fields, first 16 cereals
Cereal Name | Manufacture | Sugars | Calories | Protein | Fat | Sodium | Rating |
100% Bran | N | 6 | 70 | 4 | 1 | 130 | 68.4030 |
100% Natural Bran | Q | 8 | 120 | 3 | 5 | 15 | 33.9837 |
All-Bran | K | 5 | 70 | 4 | 1 | 260 | 59.4255 |
All-Bran ... |
Get Data Mining and Predictive Analytics, 2nd Edition now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.