Chapter 11. Linear Regression and ANOVA

Introduction

In statistics, modeling is where we get down to business. Models quantify the relationships between our variables. Models let us make predictions.

A simple linear regression is the most basic model. It’s just two variables and is modeled as a linear relationship with an error term:

yi = β0 + β1xi + εi

We are given the data for x and y. Our mission is to fit the model, which will give us the best estimates for β0 and β1 (Recipe 11.1).

That generalizes naturally to multiple linear regression, where we have multiple variables on the righthand side of the relationship (Recipe 11.2):

yi = β0 + β1ui + β2vi + β3wi + εi

Statisticians call u, v, and w the predictors and y the response. Obviously, the model is useful only if there is a fairly linear relationship between the predictors and the response, but that requirement is much less restrictive than you might think. Recipe 11.11 discusses transforming your variables into a (more) linear relationship so that you can use the well-developed machinery of linear regression.

The beauty of R is that anyone can build these linear models. The models are built by a function, lm, which returns a model object. From the model object, we get the coefficients (βi) and regression statistics. It’s easy.

The horror of R is that anyone can build these models. Nothing requires you to check that the model is reasonable, much less statistically significant. Before you blindly believe a model, check it. Most of the ...

Get R Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.