Chapter 15. Statistics

There are three kinds of lies: lies, damned lies, and statistics.

Benjamin Disraeli (1804-1881)

Statistics is the science of quantifying conjectures. How likely is an event? How much does it depend on other events? Was an event due to chance, or is it attributable to another cause? And for whatever answers you might have for these questions, how confident are you that they’re correct?

Statistics is not the same as probability, but the two are deeply intertwined and on occasion blend together. The proper distinction between them is this: probability is a mathematical discipline, and probability problems have unique, correct solutions. Statistics is concerned with the application of probability theory to particular real-world phenomena.

A more colloquial distinction is that probability deals with small amounts of data, and statistics deals with large amounts. As you saw in the last chapter, probability uses random numbers and random variables to represent individual events. Statistics is about situations: given poll results, or medical studies, or web hits, what can you infer? Probability began with the study of gambling; statistics has a more sober heritage. It arose primarily because of the need to estimate population, trade, and unemployment.

In this chapter, we’ll begin with some simple statistical measures: mean, median, mode, variance, and standard deviation. Then we’ll explore significance tests, which tell you how sure you can be that some phenomenon ...

Get Mastering Algorithms with Perl now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.