6

SURVEYS AND SAMPLING

We have been discussing surveys and samples; let us now focus a bit on their history and theory. After completing this chapter, you should be able to

  • specify what is required for a simple random sample (SRS),
  • specify the resampling procedure to determine the sampling distribution of a proportion,
  • be conversant with the vocabulary of statistical sampling (samples, populations, parameters, statistics, and sampling frame),
  • specify the resampling procedure to determine the sampling distribution of a mean,
  • describe and implement the bootstrap,
  • describe sampling schemes that may be employed when simple random sampling is infeasible,
  • explain the bias caused by self-selection and nonresponse,
  • explain the relationship between required sample sizes for a population of 300,000 versus a population of 300 million.

Although survey analysis is considered by some to belong only to the realm of the research community, data scientists should take note. Big data are not necessarily good data—as we see in the following sections, well-designed small sample surveys can produce more accurate results than huge datasets that are just lying around.

6.1 SIMPLE RANDOM SAMPLES

By the end of 1936, the United States had shown signs of economic recovery from the Great Depression, which started with the collapse of the Wall Street in 1929. GDP was back to where it had been in 1929; it had fallen by a third in the interim. Unemployment headed back to 15%, after having risen to 25% during ...

Get Introductory Statistics and Analytics: A Resampling Perspective now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.