The R Book

Box–Cox Transformations

Sometimes it is not clear from theory what the optimal transformation of the response variable should be. In these circumstances, the Box–Cox transformation offers a simple empirical solution. The idea is to find the power transformation, λ (lambda), that maximizes the likelihood when a specified set of explanatory variables is fitted to

as the response. The value of lambda can be positive or negative, but it cannot be zero (you would get a zero-divide error when the formula was applied to the response variable, y). For the case λ = 0 the Box–Cox transformation is defined as log(y). Suppose that λ = −1. The formula now becomes

images

and this quantity is regressed against the explanatory variables and the log-likelihood computed.

In this example, we want to find the optimal transformation of the response variable, which is timber volume:

data<-read.delim("c:\\temp\\timber.txt") attach(data) names(data)

[1] "volume" "girth" "height"

We start by loading the MASS library of Venables and Ripley:

library(MASS)

The boxcox function is very easy to use: just specify the model formula, and the default options take care of everything else.

boxcox(volume ~ log(girth)+log(height))

It is clear that the optimal value of lambda is close to zero (i.e. the log transformation). ...

Get The R Book now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

The R Book by Michael J. Crawley

Box–Cox Transformations

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly