“There is nothing so uncertain as a sure thing.”
It’s easy to get cocky when armed with a little bit of data. We get a sense of knowing something about the world, and it increases our confidence in what we have to say. That can be a good thing. But the world is a noisy, chaotic, ever-changing place. If the data set we’re working with is noisy, or if the inferences we’re making are dubious at best, we’ll need to proceed with caution.
The core principle to employ when communicating data is to be honest about what we know and what we don’t know, and to represent reality to the best of our ability. If there is a high degree of variation in the data, or if we’re only working with a limited sample, we should make that clear to our audience. Doing otherwise would be misleading.
In this chapter, we’ll consider two humbling and unavoidable aspects of communicating data: variation and uncertainty. By variation, we mean the degree to which individual observations differ from others in a group. By uncertainty, we mean the lack of confidence in inferences about a population based on data collected from samples.
In Chapter 6, we considered measures of central tendency, like mean and median. In so doing, we touched on some basic measures of variation, such as standard deviation and the interquartile range, concepts visualized once again in Figure 7-1.
We also considered two very different types of ...