As a seasoned statistical scientist, I like to think I’m invincible when it comes to drawing reliable conclusions from data. I’m not, of course. Nobody is. Even the world’s best data analysts make mistakes now and then. This is what makes us human.

Just recently, for example, I was humbled by the simplest of all statistical techniques: the confidence interval. I was working with a government panel, helping them to establish criteria for certifying devices that detect certain toxic substances. (Smoke detectors, for example, are certified so you know they’re reliable; in other words, they’re likely to sound an alarm when there’s smoke, and keep quiet when there isn’t). The committee members wanted to know how many samples to test in order to reach a certain confidence level on the probability of detection, the probability that, given the toxin is present, the device will actually sound an alarm.

No problem, I thought.

Back in my office, I grabbed a basic statistics book, pulled out the formula for a confidence interval of a proportion (or probability), and went to work. I began calculating the confidence bounds on the probability of detection for different testing scenarios, preparing recommendations as I went along. It wasn’t until sometime later I realized all my calculations were wrong.

Well, not wrong, the formulas and numbers were correct. But they didn’t really fit my problem. When I started the calculations, ...

Get Beyond Basic Statistics: Tips, Tricks, and Techniques Every Data Analyst Should Know now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.