5

THE BOOTSTRAP

In the previous chapters we learned about sampling distributions and some ways to compute or estimate them. A common feature of previous examples is that the relevant populations were known—for example, a binomial distribution with specified p or exponential distribution with specified λ.

You may protest—What about permutation distributions or goodness-of-fit tests? The populations were not known there. But even in such situations, we were concerned only with sampling distributions when the null hypothesis is true. For example, we assumed that the true means (and spreads and shapes) of two populations are same, or the distribution of birthdays are uniform across quarters, or the home run counts come from a Poisson distribution. These assumptions provided enough additional information that our sampling was from known populations. Thus, in permutation testing, under the null hypothesis, we could then pool the data and proceed to draw samples without replacement, using the pooled data as the known population.

We now move from the realm of probability to statistics, from situations where the population is known to where it is unknown. If all we have are data and a statistic estimated from the data, we need to estimate the sampling distribution of the statistic. In this chapter, we introduce one way to do so, the bootstrap.

5.1 INTRODUCTION TO THE BOOTSTRAP

For the North Carolina data (Case Study in Section 1.2), the mean weight of the 1009 babies in the sample is 3448.26 ...

Get Mathematical Statistics with Resampling and R now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.