Chapter 17. Regression
In the previous chapter we saw several examples of logistic regression, which is based on the assumption that the likelihood of an outcome, expressed in the form of log odds, is a linear function of some quantity (continuous or discrete).
In this chapter we’ll work on examples of simple linear regression, which models the relationship between two quantities. Specifically, we’ll look at changes over time in snowfall and the marathon world record.
The models we’ll use have three parameters, so you might want to review the tools we used for the three-parameter model in Chapter 15.
More Snow?
I am under the impression that we don’t get as much snow around here as we used to. By “around here” I mean Norfolk County, Massachusetts, where I was born, grew up, and currently live. And by “used to” I mean compared to when I was young, like in 1978 when we got 27 inches of snow and I didn’t have to go to school for a couple of weeks.
Fortunately, we can test my conjecture with data. Norfolk County happens to be the location of the Blue Hill Meteorological Observatory, which keeps the oldest continuous weather record in North America.
Data from this and many other weather stations is available from the National Oceanic and Atmospheric Administration (NOAA). I collected data from the Blue Hill Observatory from May 11, 1967 to May 11, 2020.
We can use pandas to read the data into DataFrame
:
import
pandas
as
pd
df
=
pd
.
read_csv
(
'2239075.csv'
,
parse_dates
=
[
2
])
The columns ...
Get Think Bayes, 2nd Edition now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.