Box–Cox Transformations
Sometimes it is not clear from theory what the optimal transformation of the response variable should be. In these circumstances, the Box–Cox transformation offers a simple empirical solution. The idea is to find the power transformation, λ (lambda), that maximizes the likelihood when a specified set of explanatory variables is fitted to
as the response. The value of lambda can be positive or negative, but it cannot be zero (you would get a zero-divide error when the formula was applied to the response variable, y). For the case λ = 0 the Box–Cox transformation is defined as log(y). Suppose that λ = −1. The formula now becomes
and this quantity is regressed against the explanatory variables and the log-likelihood computed.
In this example, we want to find the optimal transformation of the response variable, which is timber volume:
data<-read.delim("c:\\temp\\timber.txt") attach(data) names(data)
[1] "volume" "girth" "height"
We start by loading the MASS library of Venables and Ripley:
library(MASS)
The boxcox function is very easy to use: just specify the model formula, and the default options take care of everything else.
boxcox(volume ~ log(girth)+log(height))
It is clear that the optimal value of lambda is close to zero (i.e. the log transformation). ...
Get The R Book now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.