12Beyond Linear Regression

Essentially, all models are wrong, but some models are useful.

George Box, from Empirical Model‐Building and Response Surfaces

Statistical methods using linear regression are based on the assumptions that errors, and hence the regression responses, are normally distributed. Variable transformations increase the scope and applicability of linear regression toward real applications, but many modeling problems cannot fit in the confines of these model assumptions.

In some cases, the methods for linear regression are robust to minor violations of these assumptions. This has been shown in diagnostic methods and simulation. In examples where the assumptions are more seriously violated; however, estimation and prediction based on the regression model are biased. Some residuals (measured difference between the response and the model's estimate of the response) can be overly large in this case and wield a large influence on the estimated model. The observations associated with large residuals are called outliers, which cause error variances to inflate and reduce the power of the inferences made.

In other applications, parametric regression techniques are inadequate in capturing the true relationship between the response and the set of predictors. General “curve fitting” techniques for such data problems are introduced in Chapter 13, where the model of the regression is unspecified and not necessarily linear.

In this chapter, we look at simple alternatives ...

Get Nonparametric Statistics with Applications to Science and Engineering with R, 2nd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.