Much of what is said here can be derived from any of the many introductory statistics books by David Moore et al. (for example, Baldi and Moore, 2012) and/or “The Statistical Sleuth” by Ramsey and Schafer (2002), although no one seems to have written it all in one place.
A wide variety of statistical procedures (regression, t-tests, ANOVA) require three assumptions:
These assumptions provide a minimal set of conditions required to derive a formula, whether that formula be a test statistic with a known distribution or a confidence interval. This is a derivation and is an exercise in pure mathematics. However, once that formula has been derived, the applied mathematician quickly asks: are the assumptions actually necessary for the formulas to perform properly? This is a key insight; the assumptions required for deriving any formula may or may not be crucial for using the formula.
What is meant by “performs properly?” A 95% confidence interval is behaving properly if, in the long run, over many distinct random samples from some population, it contains the true population parameter about 95% of the time. A test statistic always produces ...