Art, like morality, consists in drawing the line somewhere.

G. K. Chesterton

In Chapter 5, we used the `correlation`

function to measure the strength of the linear relationship between two variables. For most applications, knowing that such a linear relationship exists isn’t enough. We’ll want to understand the nature of the relationship. This is where we’ll use simple linear regression.

Recall that we were investigating the relationship between a DataSciencester user’s number of friends and the amount of time the user spends on the site each day. Let’s assume that you’ve convinced yourself that having more friends *causes* people to spend more time on the site, rather than one of the alternative explanations we discussed.

The VP of Engagement asks you to build a model describing this relationship. Since you found a pretty strong linear relationship, a natural place to start is a linear model.

In particular, you hypothesize that there are constants *α* (alpha) and *β* (beta) such that:

$${y}_{i}=\beta {x}_{i}+\alpha +{\epsilon}_{i}$$

where ${y}_{i}$ is the number of minutes user *i* spends on the site daily, ${x}_{i}$ is the number of friends user *i* has, and *ε* is a (hopefully small) error term representing the fact that there are other factors not accounted for by this simple model.

Assuming we’ve determined such an `alpha`

and `beta`

, then we make predictions simply with:

`def`

`predict`

`(`

`alpha`

`:`

`float`

`,`

`beta`

`:`

`float`

`,`

`x_i`

`:`

`float`

`)`

`->`

`float`

`:`

`return`

`beta`

`*`

`x_i`

`+`

`alpha`

How do ...

Start Free Trial

No credit card required