Much of what is said here can be derived from any of the many introductory statistics books by David Moore et al. (for example, Baldi and Moore, 2012) and/or “The Statistical Sleuth” by Ramsey and Schafer (2002), although no one seems to have written it all in one place.
A wide variety of statistical procedures (regression, t-tests, ANOVA) require three assumptions:
- Normal observations or errors.
- Independent observations (or independent errors, which is equivalent, in normal linear models to independent observations).
- Equal variance—when that is appropriate (for the one-sample t-test, for example, there is nothing being compared, so equal variances do not apply).
These assumptions provide a minimal set of conditions required to derive a formula, whether that formula be a test statistic with a known distribution or a confidence interval. This is a derivation and is an exercise in pure mathematics. However, once that formula has been derived, the applied mathematician quickly asks: are the assumptions actually necessary for the formulas to perform properly? This is a key insight; the assumptions required for deriving any formula may or may not be crucial for using the formula.
What is meant by “performs properly?” A 95% confidence interval is behaving properly if, in the long run, over many distinct random samples from some population, it contains the true population parameter about 95% of the time. A test statistic always produces ...