Skip to Content
Practical Statistics for Data Scientists, 2nd Edition
book

Practical Statistics for Data Scientists, 2nd Edition

by Peter Bruce, Andrew Bruce, Peter Gedeck
May 2020
Beginner
360 pages
9h 16m
English
O'Reilly Media, Inc.
Book available
Content preview from Practical Statistics for Data Scientists, 2nd Edition

Chapter 2. Data and Sampling Distributions

A popular misconception holds that the era of big data means the end of a need for sampling. In fact, the proliferation of data of varying quality and relevance reinforces the need for sampling as a tool to work efficiently with a variety of data and to minimize bias. Even in a big data project, predictive models are typically developed and piloted with samples. Samples are also used in tests of various sorts (e.g., comparing the effect of web page designs on clicks).

Figure 2-1 shows a schematic that underpins the concepts we will discuss in this chapter—data and sampling distributions. The lefthand side represents a population that, in statistics, is assumed to follow an underlying but unknown distribution. All that is available is the sample data and its empirical distribution, shown on the righthand side. To get from the lefthand side to the righthand side, a sampling procedure is used (represented by an arrow). Traditional statistics focused very much on the lefthand side, using theory based on strong assumptions about the population. Modern statistics has moved to the righthand side, where such assumptions are not needed.

In general, data scientists need not worry about the theoretical nature of the lefthand side and instead should focus on the sampling procedures and the data at hand. There are some notable exceptions. Sometimes data is generated from a physical process that can be modeled. The simplest example is flipping a coin: ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Practical Statistics for Data Scientists

Practical Statistics for Data Scientists

Peter Bruce, Andrew Bruce
Fundamentals of Data Engineering

Fundamentals of Data Engineering

Joe Reis, Matt Housley
Fundamentals of Data Engineering

Fundamentals of Data Engineering

Joe Reis, Matt Housley

Publisher Resources

ISBN: 9781492072935Errata PageSupplemental Content