177
6
Regression
6.1 SIMPLE LINEAR LEAST SQUARES REGRESSION
Let Y be a random variable dened as the “dependent” or “response” variable, and X another ran-
dom variable dened as the “independent” or “factor” variable. Assume we have a joint sample x
i
,
y
i
, i = 1, …, n or a set of n paired values of the two variables. This is a bivariate or two-variable
situation. We already encountered this situation in Chapter 4 when we dened the covariance and
correlation coefcient. Let us start this chapter by considering the scatter plot or diagram, which
shows data pairs from a sample as markers on the x–y plane. As an example, Figure 6.1 shows pairs
of x–y values for air temperature and ground-level ozone. Note that some pairs of points have much
larger values of ozone concentration for the same temperature, as the trend would indicate. These
are probably outliers and are identied by the observation number next to the marker.
Denote by
Y
a linear least squares (LLS) estimator of Y from X
Yb
bX
=+
01
(6.1)
This is the equation of a straight line with intercept b
0
and slope b
1
. For each data point i, we have
the estimated value of Y at the specic values x
i
ybbx
ii
=+
01
(6.2)
The error (residual) for data point i is
eyy
iii
=−
(6.3)
And thus another way of writing the relationship of x
i
and y
i
observations is
ybbx e
ii
i
=+ +
01
(6.4)
Take the square and sum over all observations to obtain the total squared error
qe
yy
ii
i
i
n
i
n
==
==
22
11
()
(6.5)
We want to nd the value of the coefcients (intercept and slope) b
0
, b
1
which minimize the sum of
squared errors (over all i = 1, …, n). That is to say we want to nd b
0
, b
1
such that
min min min
()
,, ,bb bb
i
bb
ii
i
n
i
n
qe
yy
01 01 01
22
11
==
==
(6.6)
178 Data Analysis and Statistics for Geography, Environmental Science, and Engineering
How do we nd which values of b
0
, b
1
minimize q? We express q as a function of b
0
, b
1
and
then we nd the values of b
0
, b
1
that make the gradient of q equal to zero. The gradient is the
partial derivative of q with respect to each coefcient. At this point, some of you may need a
review of derivatives and optimization. You can skip the following section if you are familiar
with the topic.
6.1.1 Derivatives anD optiMization
The derivative of a function f(x) is denoted by
df dx/
. It represents a rate of change of f with x and
is equal to the gradient or slope of f with respect to x. This assumes that f(x) varies continuously
along x. You can think of a derivative as a ratio of very small changes of two variables. For example,
a very small change Δf divided by a very small change Δx. The derivative is approximately equal
to the slope obtained as the ratio Δf/Δx (Figure 6.2). Therefore,
()()df dx fx// ��
when the deltas
Δf,Δx are very small or innitesimal and can be referred to as differentials df and dt.
x
Δx
Δf
f (x)
FIGURE 6.2 Concept of derivative of a function.
60
Ozone
0
70 80
86
99
121
101
30
62
117
Temp.
90
50
100
150
FIGURE 6.1 Scatter plot of x = air temperature and y = ozone concentration with identication of outliers.

Get Data Analysis and Statistics for Geography, Environmental Science, and Engineering now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.