Model Formulae in R

The structure of the model is specified in the model formula like this:

response variable~explanatory variable(s)

where the tilde symbol ~ reads ‘is modelled as a function of’ (see Table 9.3 for examples).

Table 9.3. Examples of R model formulae. In a model formula, the function I case i) stands for ‘as is’ and is used for generating sequences I(1:10) or calculating quadratic terms I(x^2).

images

images

So a simple linear regression of y on x would be written as

y ~ x

and a one-way ANOVA where sex is a two-level factor would be written as

y ~ sex

The right-hand side of the model formula shows:

  • the number of explanatory variables and their identities – their attributes (e.g. continuous or categorical) are usually defined prior to the model fit;
  • the interactions between the explanatory variables (if any);
  • non-linear terms in the explanatory variables.

On the right of the tilde, one also has the option to specify offsets or error terms in some special cases. As with the response variable, the explanatory variables can appear as transformations, or as powers or polynomials.

It is very important to note that symbols are used differently in model formulae than in arithmetic expressions. In particular:

+ indicates inclusion of an explanatory variable in the model (not addition); ...

Get The R Book now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.