Skip to Main Content
Think Stats, 2nd Edition
book

Think Stats, 2nd Edition

by Allen B. Downey
October 2014
Beginner content levelBeginner
226 pages
5h 42m
English
O'Reilly Media, Inc.
Content preview from Think Stats, 2nd Edition

Chapter 10. Linear Least Squares

The code for this chapter is in linear.py. For information about downloading and working with this code, see Using the Code.

Least Squares Fit

Correlation coefficients measure the strength and sign of a relationship, but not the slope. There are several ways to estimate the slope; the most common is a linear least squares fit. A “linear fit” is a line intended to model the relationship between variables. A “least squares” fit is one that minimizes the mean squared error (MSE) between the line and the data.

Suppose we have a sequence of points, ys, that we want to express as a function of another sequence xs. If there is a linear relationship between xs and ys with intercept inter and slope slope, we expect each y[i] to be inter + slope * x[i].

But unless the correlation is perfect, this prediction is only approximate. The vertical deviation from the line, or residual, is

res = ys - (inter + slope * xs)

The residuals might be due to random factors like measurement error, or nonrandom factors that are unknown. For example, if we are trying to predict weight as a function of height, unknown factors might include diet, exercise, and body type.

If we get the parameters inter and slope wrong, the residuals get bigger, so it makes intuitive sense that the parameters we want are the ones that minimize the residuals.

We might try to minimize the absolute value of the residuals, or their squares, or their cubes; but the most common choice is to minimize the sum of ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Think Bayes, 2nd Edition

Think Bayes, 2nd Edition

Allen B. Downey
Practical Tableau

Practical Tableau

Ryan Sleeper

Publisher Resources

ISBN: 9781491907344Errata