Chapter 4Data Sample, Data Population, and Data Distribution

4.1 Introduction and Overview

Consciously or subconsciously, the distinction between a data sample and the underlying data population is one that is not always kept in focus. For instance, the mean concentration of a chemical contaminant computed from a data sample consisting of laboratory measurements of a few soil samples from a contaminated site is often subconsciously viewed as the definitive mean concentration of the contaminant. However, the true but unknown mean concentration theoretically based on the entire data population consisting of all possible soil measurements of the contaminant at the site (which is usually unavailable due to sampling cost constraints) can be substantially different from the sample mean. Even the word “sample” can at times be confused for the physical quantities of soil, water, air, or other environmental media used for the laboratory analysis, unless prefixed by “data” (as in “data sample”).

The term “distribution” is used frequently in statistics, to say the least. Examples include frequency distribution, continuous probability distribution, probability distribution function, cumulative distribution function, and empirical cumulative distribution function. Almost every chapter in this book contains references to distributions, in connection with basic descriptive statistics, probabilistic statements about the data population based on sample statistics (e.g., the proportion or percentage ...

Get Statistical Applications for Environmental Analysis and Risk Assessment now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.