O'Reilly logo

The R Book by Michael J. Crawley

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Box–Cox Transformations

Sometimes it is not clear from theory what the optimal transformation of the response variable should be. In these circumstances, the Box–Cox transformation offers a simple empirical solution. The idea is to find the power transformation, λ (lambda), that maximizes the likelihood when a specified set of explanatory variables is fitted to

images

as the response. The value of lambda can be positive or negative, but it cannot be zero (you would get a zero-divide error when the formula was applied to the response variable, y). For the case λ = 0 the Box–Cox transformation is defined as log(y). Suppose that λ = −1. The formula now becomes

images

and this quantity is regressed against the explanatory variables and the log-likelihood computed.

In this example, we want to find the optimal transformation of the response variable, which is timber volume:

data<-read.delim("c:\\temp\\timber.txt") attach(data) names(data)

[1] "volume" "girth" "height"

We start by loading the MASS library of Venables and Ripley:

library(MASS)

The boxcox function is very easy to use: just specify the model formula, and the default options take care of everything else.

boxcox(volume ~ log(girth)+log(height))

It is clear that the optimal value of lambda is close to zero (i.e. the log transformation). ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required