4 SOME COMMENTS ON ASSUMPTIONS

4.1 INTRODUCTION

Much of what is said here can be derived from any of the many introductory statistics books by David Moore et al. (for example, Baldi and Moore, 2012) and/or “The Statistical Sleuth” by Ramsey and Schafer (2002), although no one seems to have written it all in one place.

A wide variety of statistical procedures (regression, t-tests, ANOVA) require three assumptions:

  1. Normal observations or errors.
  2. Independent observations (or independent errors, which is equivalent, in normal linear models to independent observations).
  3. Equal variance—when that is appropriate (for the one-sample t-test, for example, there is nothing being compared, so equal variances do not apply).

These assumptions provide a minimal set of conditions required to derive a formula, whether that formula be a test statistic with a known distribution or a confidence interval. This is a derivation and is an exercise in pure mathematics. However, once that formula has been derived, the applied mathematician quickly asks: are the assumptions actually necessary for the formulas to perform properly? This is a key insight; the assumptions required for deriving any formula may or may not be crucial for using the formula.

What is meant by “performs properly?” A 95% confidence interval is behaving properly if, in the long run, over many distinct random samples from some population, it contains the true population parameter about 95% of the time. A test statistic always produces ...

Get Basic Data Analysis for Time Series with R now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.