Chapter 4. Data Visualization: Pictures make you smarter

image with no caption

You need more than a table of numbers.

Your data is brilliantly complex, with more variables than you can shake a stick at. Mulling over mounds and mounds of spreadsheets isn’t just boring; it can actually be a waste of your time. A clear, highly multivariate visualization can in a small space show you the forest that you’d miss for the trees if you were just looking at spreadsheets all the time.

New Army needs to optimize their website

New Army is an online clothing retailer that just ran an experiment to test web layouts. For one month, everyone who came to the website was randomly served one of these three home page designs.

image with no caption

They had their experiment designers put together a series of tests that promise to answer a lot of their questions about their website design.

What they want to do is find the best stylesheets to maximize sales and get people returning to their website.

The results are in, but the information designer is out

Now that they have a store of fantastic data from a controlled, randomized experiment, they need a way to visualize it all together.

So they hired a fancy information designer and asked him to pull together something that helped them understand the implications of their research. Unfortunately, all did not work out as planned.

image with no caption

You’ll need to redesign the visualizations for the analysis. It could be hard work, because the experiment designers at New Army are an exacting bunch and generated a lot of solid data.

But before we start, let’s take a look at the rejected designs. We’ll likely learn something by knowing what sort of visualizations won’t work.

Let’s take a look at the rejected designs...

The last information designer submitted these three infographics

The information designer submitted these three designs to New Army. Take a look at these designs. What are your impressions? Can you see why the client might not have been pleased?

image with no caption
image with no caption
image with no caption

What data is behind the visualizations?

“What is the data behind the visualizations?” is the very first question you should ask when looking at a new visualization. You care about the quality of the data and its interpretation, and you’d hate for a flashy design to get in the way of your own judgments about the analysis.

image with no caption

Brain Power

What sort of data do you think is behind these visualizations?

Show the data!

You can’t tell from these visualizations what data is behind them. If you’re the client, how could you ever expect to be able to make useful judgments with the visualizations if they don’t even say clearly what data they describe?

Show the data. Your first job in creating good data visualizations is to facilitate rigorous thinking and good decision making on the part of your clients, and good data analysis begins and ends with thinking with data.

image with no caption

New Army’s actual data, however, is really rich and has all sorts of great material for your visualizations.

image with no caption

Here’s some unsolicited advice from the last designer

You didn’t ask for it, but it appears that you’re getting it anyway: the outgoing information designer wants to put in his two cents about the project. Maybe his perspective help...

image with no caption

Too much data is never your problem

It’s easy to get scared by looking at a lot of data.

image with no caption
image with no caption

But knowing how to deal with what seems like a lot of data is easy, too.

If you’ve got a lot of data and aren’t sure what to do with it, just remember your analytical objectives. With these in mind, stay focused on the data that speaks to your objectives and ignore the rest.

image with no caption

Oh, really? Do you think it’s your job as a data analyst to create an aesthetic experience for your clients?

Making the data pretty isn’t your problem either

If the data visualization solves a client’s problem, it’s always attractive, whether it’s something really elaborate and visually stimulating or whether it’s just a plain ol’ table of numbers.

Making good data visualizations is just like making any sort of good data analysis. You just need to know where to start.

image with no caption

Brain Power

So how do you use a big pile of data with a bunch of different variables to evaluate your objectives? Where exactly do you begin?

Data visualization is all about making the right comparisons

To build good visualizations, first identify what are the fundamental comparisons that will address your client’s objectives. Take a look at their most important spreadsheets:

image with no caption
image with no caption

While New Army has more data than these three sheets, these sheets have the comparisons that will speak directly to what they want to know. Let’s try out a comparison now...

Your visualization is already more useful than the rejected ones

Now that’s a nice chart, and it’ll definitely be useful to your client. It’s an example of a good data visualization because it...

  • Shows the data

  • Makes a smart comparison

  • Shows multiple variables

Note

Here’s another feature of great visualizations.

image with no caption
image with no caption

Use scatterplots to explore causes

Scatterplots are great tools for exploratory data analysis, which is the term statisticians use to describe looking around in a set of data for hypotheses to test.

Analysts like to use scatterplots when searching for causal relationships, where one variable is affecting the other. As a general rule, the horizontal x-axis of the scatterplot represents the independent variable (the variable we imagine to be a cause), and the vertical y-axis of a scatterplot represents the dependent variable (which we imagine to be the effect).

image with no caption

You don’t have to prove that the value of the independent variable causes the value of the dependent variable, because after all we’re exploring the data. But causes are what you’re looking for.

image with no caption

The best visualizations are highly multivariate

A visualization is multivariate if it compares three or more variables. And because making good comparisons is fundamental to data analysis, making your visualizations as multivariate as possible makes it most likely that you’ll make the best comparisons.

And in this case you’ve got a bunch of variables.

image with no caption

Brain Power

How would you make the scatterplot visualization you’ve created more multivariate?

Show more variables by looking at charts together

One way of making your visualization more multivariate is just to show a bunch of similar scatterplots right next to each other, and here’s an example of such a visualization.

All of your variables are plotted together in this format, which enables you to compare a huge array of information right in one place. Because New Army is really interested in revenue comparisons, we can just stick with the charts that compare TimeOnSite, Pageviews, and ReturnVisits to revenue.

image with no caption

Let’s see what the client thinks...

The visualization is great, but the web guru’s not satisfied yet

You just got an email from your client, the web guru at New Army, assessing what you created for him. Let’s see what he has to say...

image with no caption
image with no caption

He wants to know about causality.

Knowing what designs work only takes him so far. In order to make his website as powerful as possible, he needs some idea of why people interact with the different home pages the way they do.

And, since he’s the client, we definitely need to address the theories he put forward.

Good visual designs help you think about causes

Your and your client’s preferred model will usually fit the data.

image with no caption

But there are always other possibilities, especially when you are willing to get imaginative about the explanations. What about other models?

image with no caption

You need to address alternative causal models or explanations as you describe your data visualization. Doing so is a real mark of integrity: it shows your client that you’re not just showing the version of the story that you like best: you’re thinking through possible failure points in your theories.

The experiment designers weigh in

The experiment designers saw the web guru’s theories and sent you some of their thoughts. Perhaps their input will enable you to evaluate the web guru’s hypotheses about why some home pages performed better than others.

To:

Head First

From:

New Army experiment designers

Re:

The boss’s ideas

He thinks that page loads count? That could be. We haven’t taken a look at the data yet to see for sure. But in our testing, #2 was the fastest, followed by #3, and then #1. So, sure, he could be right.

Note

Here’s what the experiment designers think about the first hypothesis.

As for the cooler color palette, we kind of doubt it. The color palette of Home Page #3 is coolest, followed by #2, then #1, by the way. There’s research to show that people react differently, but none of it has really persuaded us.

Note

Here’s their response to the second hypothesis.

image with no caption

The experiment designers have some hypotheses of their own

They’ve had an opportunity to take a look at your scatterplots and sent you some of their own thinking about what’s going on. These people are data junkies, and their hypotheses definitely fit.

To:

Head First

From:

New Army experiment designers

Re:

We don’t know why Home Page #3 is stronger

We’re delighted to hear that #3 is the best, but we really don’t know why. Who knows what people are thinking? But that is actually OK: as long as we’re showing improvement on the business fundamentals, we don’t need to understand people in a deep way. Still, it’s interesting to learn as much as we can.

The stylesheets are really different from each other in many ways. So when it comes to isolating individual features that might account for the performance differential, it’s hard. In the future, we’d like to take Home Page #3 and test a bunch of subtle permutations. That way, we might learn things like how button shape or font choice affect user behavior.

Note

Here’s what the experiment designers want to do next.

But we conjecture that there are two factors. First, Home Page #3 is really readable. We use fonts and a layout that are easy on the eyes. Second, the page hierarchy is flatter. You can find pretty much everything in three clicks, when for Home Page #1 it takes you more like seven clicks to find what you want. Both could be affecting our revenue, but we need more testing to say for sure.

Note

Maybe it’s fonts and layout.

Note

Maybe it’s hierarchy of the pages.

The client is pleased with your work

You created an excellent visualization that enabled New Army to quickly and simultaneously assess all the variables they tested in their experiment.

And you evaluated that visualization in light of a bunch of different hypotheses, giving them some excellent ideas about what to test for in the future.

image with no caption

Orders are coming in from everywhere!

Because of the new website, traffic is greater than ever. Your visualization of the experimental results showed what they needed to know to spruce up their website.

image with no caption

Even better, New Army has embarked on a continuous program of experimentation to fine-tune their new design, using your visualization to see what works. Nice job!

image with no caption

Get Head First Data Analysis now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.