CAVEATS

Multivariable regression is plagued by the same problems univariate regression is heir to, plus many more of its own. Is the model correct? Are the associations spurious?

In the univariate case, if the errors were not normally distributed, we could take advantage of permutation methods to obtain exact significance levels in tests of the coefficients. Exact permutation methods do not exist in the multivariable case.

When selecting variables to incorporate in a multivariable model, we are forced to perform repeated tests of hypotheses, so that the resultant p-values are no longer meaningful. One solution, if sufficient data are available, is to divide the dataset into two parts, using the first part to select variables, and the second part to test these same variables for significance.

If choosing the correct functional form of a model in a univariate case presents difficulties, consider that in the case of k variables, there are k linear terms (should we use logarithms? should we add polynomial terms?) and k(k − 1) first-order cross products of the form xixk. Should we include any of the k(k − 1)(k − 2) second-order cross products?

A common error is to attribute the strength of a relationship to the magnitude of the predictor’s regression coefficient (see, for example, Moyé, 2000, p. 213). Just scale the units in which the predictor is reported to see how erroneous such an assumption is.

The regression coefficient is the correlation coefficient multiplied by the ratio of ...

Get Common Errors in Statistics (and How to Avoid Them), 4th Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.