177
6
Regression
6.1 SIMPLE LINEAR LEAST SQUARES REGRESSION
Let Y be a random variable dened as the “dependent” or “response” variable, and X another ran-
dom variable dened as the “independent” or “factor” variable. Assume we have a joint sample x
i
,
y
i
, i = 1, …, n or a set of n paired values of the two variables. This is a bivariate or two-variable
situation. We already encountered this situation in Chapter 4 when we dened the covariance and
correlation coefcient. Let us start this chapter by considering the scatter plot or diagram, which
shows data pairs from a sample as markers on the x–y plane. As an example, Figure 6.1 shows pairs
of x–y values for air temperature and ground-level ozone. Note that some pairs of points have much
larger values of ozone concentration for the same temperature, as the trend would indicate. These
are probably outliers and are identied by the observation number next to the marker.
Denote by
a linear least squares (LLS) estimator of Y from X
Yb
=+
(6.1)
This is the equation of a straight line with intercept b
0
and slope b
1
. For each data point i, we have
the estimated value of Y at the specic values x
i
ybbx
=+
(6.2)
The error (residual) for data point i is
eyy
iii
(6.3)
And thus another way of writing the relationship of x
i
and y
i
observations is
ybbx e
i
01
(6.4)
Take the square and sum over all observations to obtain the total squared error
qe
i
i
n
i
n
==−
==
∑∑
11
(6.5)
We want to nd the value of the coefcients (intercept and slope) b
0
, b
1
which minimize the sum of
squared errors (over all i = 1, …, n). That is to say we want to nd b
0
, b
1
such that
min min min
,, ,bb bb
i
bb
i
n
i
n
qe
01 01 01
11
==−
==
∑∑
(6.6)