The model.matrix function

Creating tables of dummy variables for use in statistical modelling is extremely easy with the model.matrix function. You will see what the function does with a simple example. Suppose that our dataframe contains a factor called parasite indicating the identity of a gut parasite. The variable called parasite has five levels: vulgaris, kochii, splendens, viridis and knowlesii. Note that there was no header row in the data file, so the variable name parasite had to be added subsequently, using names:

data<-read.table("c:\\temp \\parasites.txt")
names(data)<-"parasite"
attach(data)

In our modelling we want to create a two-level dummy variable (present/absent) for each parasite species, so that we can ask questions such as whether the mean value of the response variable is significantly different in cases where vulgaris is present and when it is absent. The long-winded way of doing this is to create a new factor for each species:

vulgaris<-factor(1*(parasite=="vulgaris"))
kochii<-factor(1*(parasite=="kochii"))

and so on, with 1 for TRUE (present) and 0 for FALSE (absent). This is how easy it is to do with model.matrix:

model.matrix(~parasite-1)

      parasite kochii   parasiteknowlesii    parasitesplendens    parasiteviridis
 1                  0                   0                    0                  0
 2                  0                   0                    1                  0
 3                  0                   1                    0                  0
 4                  0                   0                    0                  0
 5                  0                   1                    0                  0
 6                  0                   0                    0                  1
 7                  0                   0                    1                  0
 8                  0                   0                    1                  0
 9                  0                   0                    0                  1
10                  0                   0                    0                  0
11                  0                   0                    1                  0
12                  0                   0                    0                  1
13                  0                   0                    1                  0

The −1 in the model formula ensures that we create a dummy variable for each of the five parasite species (technically, ...

Get The R Book now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.