Praetz [1981] reviews the effect of autocorrelation on multivariable regression. For more on the use of instrumental variables, see Leigh and Schembri [2004]. Babyak [2004] provides a nontechnical introduction to the dangers of overfitting.

Inflation of R2 as a consequence of multiple tests also is considered by Rencher [1980].

Osborne and Waters [2002] review tests of the assumptions of multivariable regression. Harrell, Lee, and Mark [1996] review the effect of violation of assumptions on generalized linear models and suggest the use of the bootstrap for model validation. Hosmer and Lemeshow [2001] recommend the use of the bootstrap or some other validation procedure before accepting the results of a logistic regression.

Diagnostic procedures for use in determining an appropriate functional form are described by Tukey and Mosteller [1977], Therneau and Grambsch [2000], Hosmer and Lemeshow [2001], and Hardin and Hilbe [2003].

Survival analysis may also be viewed as a general linear model, or GLM [McCullagh and Nelder, 1989, Chapter 13]. GLMs are considered in the next chapter.

Automated construction of a decision tree dates back to Morgan and Sonquist [1963]. Comparisons of the regression and tree approaches were made by Nurminen [2003] and Perlich, Provost, and Simonoff [2003]. Good [2011] expands on the appropriate use of decision trees.


1  That is one dimension for risk of death, the dependent variable, and 19 for the explanatory variables.

2  Described ...

Get Common Errors in Statistics (and How to Avoid Them), 4th Edition now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.