Example: A Simple Linear Model

A linear regression assumes that there is a linear relationship between the response variable and the predictors. Specifically, a linear regression assumes that a response variable y is a linear function of a set of predictor variables x1, x2, ..., xn.

As an example, we’re going to look at how different metrics predict the runs scored by a baseball team.[51] Let’s start by loading the data for every team between 2000 and 2008. We’ll use the SQLite database that we used in Chapter 14 and extract the fields we want using an SQL query:

> library(RSQLite)
> drv <- dbDriver("SQLite")
> con <- dbConnect(drv, 
+   dbname=paste(.Library, "/nutshell/data/bb.db", sep="")
> team.batting.00to08 <- dbGetQuery(con, 
+   paste(
+     'SELECT teamID, yearID, R as runs, ',
+     '   H-"2B"-"3B"-HR as singles, ',
+     '   "2B" as doubles, "3B" as triples, HR as homeruns, ',
+     '   BB as walks, SB as stolenbases, CS as caughtstealing, ',
+     '   HBP as hitbypitch, SF as sacrificeflies, ',
+     '   AB as atbats ',
+     '   FROM Teams ',
+     '   WHERE yearID between 2000 and 2008'
+     )
+   )

Or, if you’d like, you can just load the file from the nutshell package:

> library(nutshell)
> data(team.batting.00to08)

Because this is a book about R and not a book about baseball, I renamed the common abbreviations to more intuitive names for plays. Let’s look at scatter plots of runs versus each other variable, so that we can see which variables are likely to be most important.

We’ll create a data frame for plotting, using the ...

Get R in a Nutshell now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.