Praetz [1981] reviews the effect of autocorrelation on multivariable regression. For more on the use of instrumental variables, see Leigh and Schembri [2004]. Babyak [2004] provides a nontechnical introduction to the dangers of overfitting.

Inflation of R2 as a consequence of multiple tests also is considered by Rencher [1980].

Osborne and Waters [2002] review tests of the assumptions of multivariable regression. Harrell, Lee, and Mark [1996] review the effect of violation of assumptions on generalized linear models and suggest the use of the bootstrap for model validation. Hosmer and Lemeshow [2001] recommend the use of the bootstrap or some other validation procedure before accepting the results of a logistic regression.

Diagnostic procedures for use in determining an appropriate functional form are described by Tukey and Mosteller [1977], Therneau and Grambsch [2000], Hosmer and Lemeshow [2001], and Hardin and Hilbe [2003].

Survival analysis may also be viewed as a general linear model, or GLM [McCullagh and Nelder, 1989, Chapter 13]. GLMs are considered in the next chapter.

Automated construction of a decision tree dates back to Morgan and Sonquist [1963]. Comparisons of the regression and tree approaches were made by Nurminen [2003] and Perlich, Provost, and Simonoff [2003]. Good [2011] expands on the appropriate use of decision trees.


1  That is one dimension for risk of death, the dependent variable, and 19 for the explanatory variables.

2  Described ...

