7.2 SIMPLE REGRESSION MODELS

7.2.1 Overview

A simple regression model is a formula describing the relationship between one descriptor variable and one response variable. These formulas are easy to explain; however, the analysis is sensitive to any outliers in the data. The following section presents methods for generating simple linear regression models as well as simple nonlinear regression models.

7.2.2 Simple Linear Regression

Overview

Where there appears to be a linear relationship between two variables, a simple linear regression model can be generated. For example, Figure 7.9 shows the relationship between a descriptor variable B and a response variable A. The diagram shows a high degree of correlation between the two variables. As descriptor variable B increases, response variable A increases at the same rate. A straight line representing a model can be drawn through the center of the points. A model that would predict values along this line would provide a good model.

A straight line can be described using the formula:

y = a + bx

where a is the point of intersection with the y-axis and b is the slope of the line. This is shown graphically in Figure 7.10.

In Table 7.8, a data set of observations from a grocery store contains variables Income and Monthly sales. The variable Income refers to the yearly income for a customer and the Monthly sales represent the amount that particular customer purchases per month. This data can be plotted on a scatterplot and a linear relationship ...

Get Making Sense of Data: A Practical Guide to Exploratory Data Analysis and Data Mining now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.