THE ESSENTIAL STATS 39
OK, the difference is not signiﬁcant. If it were we would check if it is large; if the difference
matters.
When we ran the TwoWayAnovaConf.py code we got:
Observed F-statistic: 0.93
We have 90.0 % confidence that the true F-statistic is between: 0.50 and 8.44
***** Bias Corrected Confidence Interval *****
We have 90.0 % confidence that the true F-statistic is between: 0.20 and 2.04
4.7 Linear Regression
4.7.1 Why and when
Linear regression is closely tied to linear correlation. Here we try to ﬁnd the line that best ﬁts
our data so that we can use it to predict y given a new x. Correlation measures how tightly our data
ts that line, and therefore how good we expect our prediction to be.
4.7.2 Calculate with example
The ﬁrst step is to draw a scatter plot of your data. Traditionally, the independent variable is
placed along the x-axis, and the dependent variable is placed along the y-axis. The dependent vari-
able is the one that we expect to change when the independent one changes. Note whether the
data looks as if it lies approximately along a straight line. If it has some other pattern, for exam-
ple, a curve, then a linear regression should not be used.
The most common method used to ﬁnd a regression line is the least squares method. Remember
that the equation of a line is:
y = bx + a
where b equals the slope of the line and a is where the line crosses the y-axis.
The least squares method minimizes the vertical distances between our data points (the
observed values) and our line (the predicted values).
The equation of the line we are trying to ﬁnd is:
y' = bx + a
where y' is the predicted value of y for some x.
BE CAREFUL
Regression can be misleading when there are outliers or a nonlinear relationship.
40 STATISTICS IS EASY!
We have to calculate b and a.
where XY
SP
is the sum of products:
and X
SS
is the sum of squares for X:
In our example:
XY
SP
= 230.5 and X
SS
= 163677 and 230.5 / 163677 = 0.0014, so b = 0.0014.
The regression line will always pass through the point ( , ) so we can plug this point into our
equation to get a, where the line passes through the y-axis.
x
i
y
i
1350 3.6 1353 3.5 -3 .1 -.3 9
1510 3.8 1353 3.5 157 .3 47.1 24649
1420 3.7 1353 3.5 67 .2 13.4 4489
1210 3.3 1353 3.5 -143 -.2 28.6 20449
1250 3.9 1353 3.5 -103 .4 -41.2 10609
1300 3.4 1353 3.5 -53 -.1 5.3 2809
1580 3.8 1353 3.5 227 .3 68.1 51529
1310 3.7 1353 3.5 -43 .2 -8.6 1849
1290 3.5 1353 3.5 -63 0 0 3969
1320 3.4 1353 3.5 -33 -.1 3.3 1089
1490 3.8 1353 3.5 137 .3 41.1 18769
1200 3.0 1353 3.5 -153 -.5 76.5 23409
1360 3.1 1353 3.5 7 -.4 -2.8 49
Totals - - - - - - 230.5 163677

Get Statistics is Easy! now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.