The ability to generate summaries and make general statements about the data, and relationships within the data, is at the heart of exploratory data analysis and data mining methods. In almost every situation we will be making general statements about entire populations, yet we will be using a subset or sample of observations. The distinction between a *population* and a *sample* is important:

**Population**: A precise definition of all possible outcomes, measurements or values for which inferences will be made about.**Sample**: A portion of the population that is representative of the entire population.

*Parameters* are numbers that characterize a population, whereas *statistics* are numbers that summarize the data collected from a sample of the population. For example, a market researcher asks a portion or a sample of consumers of a particular product, about their preferences, and uses this information to make general statements about all consumers. The entire population, which is of interest, must be defined (i.e. all consumers of the product). Care must be taken in selecting the sample since it must be an unbiased, random sample from the entire population. Using this carefully selected sample, it is possible to make confident statements about the population in any exploratory data analysis or data mining project.

The use of statistical methods can play an important role including:

**Summarizing the data**: Statistics, not only provide us with methods for summarizing sample data ...

Start Free Trial

No credit card required