“As the cathedral is to its foundation so is an effective presentation of facts to the data.”
There’s something breathtaking about witnessing data communicated well—it’s a lot like encountering an architectural wonder. Think of the first time you saw the video of Hans Rosling interacting with global development data on stage, or when you first viewed a well-designed New York Times visualization online. When data is communicated well, it’s easy to appreciate both the data itself and the delivery of that data at the same time. Those two elements can be fashioned together into an overall experience that makes you feel that you understand the world better, and that you want to do something with your newfound understanding.
On the other hand, think of a time when you suffered through a presentation at work that included poorly designed charts and graphs containing extraneous information, or all those infographics you wish you never laid eyes on that skewed the figures horribly and left you feeling dumber. Either the foundation was hopelessly cracked or the building itself was inexcusably shabby, or both. Not every building is a cathedral.
What’s the difference between these two types of experiences? It’s a question of whether those who designed and delivered the message were adept at communicating data.
This is a book about just that. Communicating data is simply a special case of communicating in general (more about that in a minute)—one that incorporates quantitative statements about the universe. In this context, we aren’t using the word “data” in the general sense of factual information, but in the more specific sense of “information in numerical form that can be digitally transmitted or processed”—ones and zeros in databases, spreadsheets, and tables.
This is also a book about using Tableau. This book will show you how to use Tableau to communicate data well, though you can apply the principles and methods covered in this book to using other tools. It’s not intended to be an exhaustive Tableau manual, nor is it intended to guide you in the actual acquiring and storing of your data. While those are necessary steps, the goal of this book is to help you take all that data you have and convey its message with efficiency and impact.
How is “communicating data” distinct from the other steps in the overall process that begins with a question and ends with a shared insight? Figure 1-1 presents the overall data discovery process, and shows where communicating data fits in that process.
The highly iterative process often begins with a question, which can be specific (“which combination of products occurs the most often?”) or general (“what can we learn about historical sales of our products?”). The next step is gathering data if it’s available (e.g., historical sales). Then comes the often arduous process of structuring data, also called “data munging” or “data wrangling.” In this step, data is formatted, shaped, merged, converted, and otherwise manipulated into a form that is amenable to the next step, exploring data. In this step, the data is viewed and analyzed from a number of angles until one or more insights are gleaned. These insights form the message involved in communicating data, the step at which quantitative statements are shared with others. While this book primarily concerns this final step, it will also touch on the other steps in the process, as they contribute to the formation of the message to be communicated.
In order to examine the idea of communicating data in greater detail, let’s return to the birthplace of information theory: Bell Laboratories.
The year was 1949, and two employees at Bell Laboratories—Claude Elwood Shannon and his coauthor Warren Weaver—published a seminal article in the University of Illinois Press called The Mathematical Theory of Communication. In it, they introduced a model of communication systems in which an “information source” selects a message and then a “transmitter changes this message into the signal which is actually sent over the communication channel from the transmitter to the receiver” (see Figure 1-2).
To illustrate the model, consider oral speech: the information source is the brain of a certain person; the transmitter is this person’s vocal system; the channel is the sound waves that travel as particles in the air collide; the receiver is the auditory system of a second person; and the destination is this second person’s brain. The noise source includes other sounds present at the time the first person speaks.
Shannon and Weaver describe how this model can apply to a wide variety of cases, including those in which the symbols are “written letters or words, or musical notes, or spoken words, or symphonic music, or pictures.” Put simply, the model describes the process of one mind attempting to affect another, and it’s the very essence of the human experience.
In this book, we’re dealing with the case in which the symbols communicated are abstract graphic representations of data in the form of charts, graphs, and maps: data visualizations. Viewing the communication of data in this conceptual framework is helpful because it reminds us of what we should be taking into account. Knowing how the system can fail is a key first step.
How accurately can the symbols of communication be transmitted?
How precisely do the transmitted symbols convey the desired meaning?
How effectively does the received meaning affect conduct in the desired way?
As far as technology has advanced since these problems were outlined, we still often suffer from technical problems—inadequate screen resolution, broken audio, grainy video, poor print quality—anything that results in the receiver receiving something different than what was originally crafted. Considering all the different devices, operating systems, and software the person on the receiving end could be using, it can be challenging to make sure the message itself is intact.
The semantic problem occurs when we encode the message using inappropriate visualization types, or when the symbols chosen won’t be understood by the person on the receiving end. For example, encoding a value using a circle’s diameter rather than its area will skew the perceived proportions (see Figure 1-3).
Another example of the semantic problem occurs when symbols are used that are only understood by a subset of all the audience members, such as the donkey and elephant icons that represent the Democratic and Republican parties of the American political system.
The effectiveness problem is the “so what?” problem, and it might be the most important. If everything falls into place, and the message is perfectly encoded, transmitted, decoded, and understood, but the recipient doesn’t care, or doesn’t take the desired action, then the communication ultimately failed.
In order to address these three types of communication problems, I’d like to propose six principles to consider when communicating data. They are numbered in the general order that they transpire, though it’s fully recognized that this process is highly iterative and rarely proceeds in a straight line. Communicating is a creative process—one that involves crafting and refining a message—and as such it will necessarily involve many loops:
Know your goal
Use the right data
Select suitable visualizations
Design for aesthetics
Choose an effective medium and channel
Check the results
Let’s look at these principles in detail.
It’s important to note that “information” and the “message” are not synonymous. Information is the set of all possible messages that can be selected by the information source. The message is what was selected from this set to be communicated. Why does this matter? In a world where information is increasing exponentially, choosing your message is an important first step.
Before you choose your message, however, it’s critical to know your goal, which you can articulate by answering a few key questions up front (see Figure 1-4):
Who are you trying to communicate with? (target audience)
What do you want them to know? (intended meaning)
Why? What do you want them to do about it? (desired effect)
The answers to these questions may be very different for different disciplines. A data journalist working on a breaking story doesn’t have the same goal as a business intelligence analyst working in a corporation. That they would communicate data differently shouldn’t be surprising, and may be entirely appropriate.
The important part is articulating your goal—actually writing out the answers to the three questions just listed. If you’re not certain about the answer to any one of these questions, don’t go any further until you’re sure. (And it’s OK if your sole purpose is to make someone laugh. You don’t have to be trying to achieve world peace with every data message.)
As the saying goes, sometimes less is more. One of the most impactful examples of communicating data that I’ve ever seen involved the presentation of a single number: 14. That was the single data point shared with a group of managers assembled to discuss customer service within an organization. The group of managers came to learn that this number represented the number of times a particular customer had been transferred between departments during a single call to a helpline. It motivated an entire organization to revamp the customer experience.
Sometimes less is really less, though. While driving in the car, I heard a report on the radio in which a number of cities were compared based on the percentage of fish packages that were mislabeled. Digging into the data myself later that day, I found that the sample sizes were too small to infer much of anything about the relative mislabeling rates in the cities. A whole host of listeners were misled by the story at least as much as by the fish labels.
And more is often less. It’s possible, and actually quite typical, to overwhelm the audience with data. It’s easy to see why this happens: you worked hard to gather the data, and it feels like that data increases the weight of your message and lends additional credibility. But all that extra data only serves to drown out the message. Shannon and Weaver identified this problem: “if you overcrowd the capacity of the audience, you force a general and inescapable error and confusion.” In other words, if a data point doesn’t add to your message, then it detracts from it.
The last and most important point about selecting data is that your message must be both ethical and based on sound epistemology. In other words: don’t lie with statistics—we have enough of that to contend with already. Don’t fall prey to the many and various forms of statistical and logical fallacies, such as mistaking correlation for causation, taking unreasonable inductive leaps, applying the Gaussian when it doesn’t apply, inferring more than the sample size allows, and so on. These are just a few of the many icebergs to avoid (in this book, I hope to show you how to avoid some of them when you use Tableau).
Once you’ve identified the data that you’ll need to make your point, the next step is deciding how to encode the message. Encoding the data means converting the data values themselves into abstract graphical representations, like size or color or shape.
Knowing how the human mind makes use of different graphical displays of information to perform specific tasks is the key to avoiding the semantic problem (wherein the symbols don’t convey the intended meaning precisely). Luckily for us, the last half-century has produced pioneers in the field of information visualization who have shed considerable light on this topic.
Tableau’s own Jock Mackinlay has produced a helpful framework for identifying the order of effectiveness of different encoding variables based on the type of data being used. First, let’s start with a description of the different types of data: quantitative, ordinal, and nominal (see Figure 1-5).
A few points are immediately obvious:
Position is the most effective form of encoding for all data types.
Length, angle, and area decrease in effectiveness from quantitative to ordinal to nominal.
Color hue increases in effectiveness from quantitative to ordinal to nominal.
Keeping this ranking in mind as you select your visualization type will help ensure you are crafting a message that will be easily decoded and understood by your audience.
If the overall quality of the communication were only affected by the ease of decoding, we would not need any more principles. In actuality, we also need to consider aesthetics, media and channel, and the actual impact.
Let me play devil’s advocate: Why consider aesthetics at all? Isn’t any attempt to make a visualization “look better” just chart junk or design fluff? Won’t graphic elements that aren’t data just get it the way? Shouldn’t the data itself be beautiful enough for readers?
I understand this viewpoint, I really do. I’ve seen plenty of attempts to beautify data visualizations that either distract the audience or, worse, distort the data so as to completely mislead the audience. We all agree that this result must be avoided. One way to avoid it is to banish all aesthetic elements forevermore. And yet, that’s not a world I’d want to live in, because there is a clear value to elegant design and what Willard Cope Brinton called “judicious embellishment of charts”.
The value? Aesthetic elements can arouse interest and enhance memory. So long as they do so without overly hampering cognition, they can be used to achieve the goal.
There are a number of aesthetic elements of every data visualization, and a handful of common mistakes people make when creating them:
Poor color schemes
Many different fonts
Vertical or angled labels
Dark background colors
Thick borders or grid lines
Useless images and clip art
Lazily accepting most software defaults
Consider Figure 1-7, which shows two charts that illustrate the growth of the number of possible moves in a chess game as the game progresses. The default Excel chart is on the left and a redesigned version is on the right.
Figure 1-8 shows another example of poor design and improved design, this time showing the growth of employment at Apple after the return of Steve Jobs in 1997.
A little design goes a long way. If you know a good graphic artist, take her out for coffee and get her input. Design is a whole separate discipline that you could spend a lifetime learning about and perfecting, but paying even a small amount of attention to how your data visualizations look can mean the difference between being ignored and arousing interest, or between being quickly forgotten and being remembered for a while to come.
What form the message takes (medium) and how it gets delivered to the audience (channel) are critical elements of any data communication effort. Care needs to be taken in selecting the “how,” the “when,” and the “where” to improve the chances that your audience is reached and your goals are met.
Earlier, I referred to Hans Rosling’s famous presentation at TED in February of 2006: the animation of the GapMinder scatterplot, along with the narration and the pointing and arm waving, are key features of the communication effort. The data set he was presenting was complex, and the communication effort was also complex. He pulled it off, and the impact has been incredibly deep.
When you communicate data, there are a few choices to make about how you will do it:
Standalone graphics or narrated?
Static, interactive, animated, or combined graphics?
If narrated: recorded, live, or both?
If live: remote, in person, or both?
In all cases: broadcast, directed, or both?
The framework in Figure 1-9 shows how these choices typically relate in terms of effort, reach, and likely impact.
On the one hand, it’s obviously very simple and easy to create a static chart and send an email to a group of colleagues or publish it to the Web as a standalone graphic. This approach to communicating data could have a very deep impact on your target audience, but it most likely will not. It’s also important to note that the cost in time and effort is very low.
On the other hand, narrating a combined set of static and dynamic graphics in person to a live audience is a very complex endeavor. A limited number of people will be present, but if you pull it off like Hans Rosling has, the impact could be enormous. The effort is high (and don’t forget to rehearse).
These are both extreme examples of communicating data. The area in between these two extremes includes publishing blog posts that combine interactive data visualizations and detailed commentary—something Tableau Public makes very easy to do.
As with anything, there is a trade-off between cost and impact at play here. If your target audience is a small firm in South Africa and the stakes are high, for example, getting on an airplane to walk them through the data may be a good investment. On the other hand, if you’d like as many people as possible in the general public to receive a data message, you’ll have to find an effective way to broadcast the message. Knowing your goal, and knowing who makes up your target audience, informs these decisions.
It is a good habit in general to incorporate into your efforts feedback loops and checkpoints that help you gauge whether you’ve achieved your intended results or not. This allows for course correction in the case of woefully unmet goals, or fine-tuning in the case of slight miscues.
There are a few questions to ask when you check the results. We’ll call this the “RUI”:
Did the audience even receive your message at all? Who did and who didn’t?
Did the audience interpret the data message in the way you intended?
Did the audience react in the way you wanted them to react?
In this chapter, we considered the act of communicating data as an integral step in a larger data discovery process, and an important type of communication in general. We also considered three problems that can get in the way of communicating data well—the technical problem, the semantic problem, and the effectiveness problem. Lastly, we considered six principles to overcome these problems and achieve our goals. These six principles can be applied regardless of the tool or software used.
In the next chapter, we’ll provide a general overview of one particular software tool for communicating data: Tableau.