Unless you argue to the contrary, all of the rows in the dataframe will be used in the model fitting, there will be no offsets, and all values of the response variable will be given equal weight. Variables named in the model formula will come from the defined dataframe (data=mydata), the with function (p. 18) or from the attached dataframe (if there is one). Here we illustrate the following options:

- subset
- weights
- data
- offset
- na.action

We shall work with an example involving analysis of covariance (p. 490 for details) where we have a mix of both continuous and categorical explanatory variables:

data<-read.table("c:\\temp\\ipomopsis.txt",header=T) attach(data) names(data) [1] "Root" "Fruit" "Grazing"

The response is seed production (Fruit) with a continuous explanatory variable (Root diameter) and a two-level factor Grazing (Grazed and Ungrazed).

Perhaps the most commonly used modelling option is to fit the model to a subset of the data (e.g. fit the model to data from just the grazed plants). You could do this using subscripts on the response variable and all the explanatory variables:

`model<-lm(Fruit[Grazing=="Grazed"] ~ Root[Grazing=="Grazed"])`

but it is much more straightforward to use the subset argument, especially when there are lots of explanatory variables:

`model<-lm(Fruit~ Root,subset=(Grazing=="Grazed"))`

The answer, of course, is the same in both cases, but the summary.lm and summary.aov tables are neater with ...

Start Free Trial

No credit card required