Sometimes it is not clear from theory what the optimal transformation of the response variable should be. In these circumstances, the Box–Cox transformation offers a simple empirical solution. The idea is to find the power transformation, λ (lambda), that maximizes the likelihood when a specified set of explanatory variables is fitted to

as the response. The value of lambda can be positive or negative, but it cannot be zero (you would get a zero-divide error when the formula was applied to the response variable, *y*). For the case λ = 0 the Box–Cox transformation is defined as log(*y*). Suppose that λ = −1. The formula now becomes

and this quantity is regressed against the explanatory variables and the log-likelihood computed.

In this example, we want to find the optimal transformation of the response variable, which is timber volume:

```
data<-read.delim("c:\\temp\\timber.txt") attach(data) names(data)
[1] "volume" "girth" "height"
```

We start by loading the MASS library of Venables and Ripley:

`library(MASS)`

The boxcox function is very easy to use: just specify the model formula, and the default options take care of everything else.

`boxcox(volume ~ log(girth)+log(height))`

It is clear that the optimal value of lambda is close to zero (i.e. the log transformation). ...

Start Free Trial

No credit card required