Communicating Data with Tableau

ChapterÂ 4.Â Ratios and Rates

âWe are the safest large city in America, but any crime rate is too high.â

âMichael Bloomberg, former mayor of New York City

Until now, weâve considered absolute amounts of countable and measurable nouns, like population, tons of garbage collected, and number of rat sightings by borough. Pleasant, I know.

But not all boroughs are created equal. As we saw in ChapterÂ 1, they have different surface areas and different numbers of people living in them. The fact that they produce different amounts of garbage isnât surprising, and probably doesnât make for a very interesting message to communicate to an audience.

These absolute comparisons, while helpful for showing the big picture, arenât generally considered âapples-to-applesâ comparisons. There are apples, and, well, there are Big Apples.

A helpful way to ânormalizeâ comparisons is using ratios, rates, proportions, and percentages. Whatâs the difference between these four?

A ratio is a comparison of two terms expressed as a quotient. For example, Manhattan produced 0.264 tons of recycle for every ton of refuse. Ratios can be expressed as âx to y,â âx:y,â âx/y,â or as a decimal.
A rate is a ratio in which the two terms have different units. For example, the population density of Brooklyn is 36,136 residents per square mile. Rates are often predictive because time can be used as the denominator (crime rates, population growth rates, etc.).
A proportion is a ratio in which the numerator is a partial amount and the denominator is the total amount (expressed as a number between 0 and 1). For example, the proportion of the NYC population living in the Bronx is 0.169. A proportion is expressed as a number between 0 and 1.
A percentage is a ratio comparing a number to 100. For example, 16.9% of NYC residents live in the Bronx. A percentage is generally a number between 0 and 100, but can be larger than 100 (e.g., âsales have increased by 150% year-over-yearâ).

These types of normalized comparisons can make for much more interesting messages to communicate. Luckily, they are quite easy to create in Tableau. Weâll explore the first two in this chapter, and the last two in the next.

Ratios

The data source itself may include a field that is a ratio, but most often the raw data includes terms that an analyst needs to combine to create a ratio. Tableau can handle these types of computations with Calculated Fields, in which new fields can be created by combining existing fields using a variety of operations. We introduced the concept in ChapterÂ 2, and now weâll build on it.

Letâs return to the DSNY data set to illustrate the technique, and weâll see if the ratio comparison is any more interesting than the absolute figures.

The data reported by DSNY includes three continuous Measures (quantitative fields) for each community district:

RefuseTonsCollected: Amount of nonrecyclable garbage
PaperTonsCollected: Amount of recyclable paper
MGPTonsCollected: Amount of recyclable metal, glass, and plastic

The second two data types (paper and MGP) are both recyclable, and so can be summed to create a new variable for each community district that we will call RecyclableTonsCollected to stay consistent with the provided nomenclature.

A comparison of the ratio of recyclable material to refuse would be interesting to consider, as it would indicate which communities in New York City are doing a better job recycling.

First, weâll need to create a new calculated field that sums the two recyclable fields. Right-click anywhere in the lefthand Dimensions or Measures panels, and select Create Calculated Field. Then, fill out the resulting dialog box, as shown in FigureÂ 4-1.

FigureÂ 4-1.Â Summing two fields with a Calculated Field

Next, letâs create a second calculated field that creates the ratio of recyclable material to refuse. Right-click in the Dimensions or Measures area again and fill out the dialog box as shown in FigureÂ 4-2.

Notice that we put the numerator and denominator in brackets, preceded by SUM. This is critical, because as we move up a level of from community district to borough, we donât want to add the individual ratios of each community district; we want to create a new ratio that represents the quotient of the aggregate amounts. The first approach would create a larger ratio that is misleading.

FigureÂ 4-2.Â Creating a ratio with a Calculated Field

Now that weâve created the ratios, how can we best visualize them? Letâs start with a simple bar chart of the aggregate ratios of each of the boroughs. To create one, weâll drag the Borough field from the Dimensions area on the left to the Rows shelf, and weâll drag the Recycle to Refuse Ratio calculated field to the Columns shelf. Color wouldnât strictly be needed, but weâll add it anyway for aesthetics by dragging Borough to the Color shelf. Lastly, weâll sort the bars in descending order to put the borough with the highest recycle ratio at the top. The screen shot in FigureÂ 4-3 shows the resulting view.

We can easily see from this simple bar chart that Manhattan has the highest ratio of recycle to refuse, but that it still only recycles about one ton of material for every four tons of trash. The ratio for Manhattan is about twice as great as the ratio for the Bronx.

Does this mean each community district in Manhattan had a higher recycle ratio than each community district in the Bronx? Not necessarily, as weâre dealing with the terms in the aggregate.

How would we communicate the results at a community district level? If we just replace Borough with CommunityDistrict in the Rows shelf, then weâll get a bar chart where all the community district #1s will be aggregated, and all the community district #2s will be aggregated, and so on. Aggregating community districts in this way isnât very meaningful. Instead, weâd like to see community district ratios for each borough.

FigureÂ 4-3.Â Visualizing the ratio of recycle to refuse for each borough

To make this comparison, letâs click and drag the Borough pill from the Rows shelf up to the Columns shelf, and add the CommunityDistrict field from the Measures panel to the Rows shelf to create a grid of bar charts, as shown in FigureÂ 4-4.

FigureÂ 4-4.Â Ratios by community district and borough

From this view, we can now see that even within boroughs, not all community districts are created equal. Manhattan community district #1 has the highest ratio overall, but itâs clear that some community districts in the Bronx have higher ratios than some in Manhattan.

But what if we wanted to know the exact ratios? The bar charts in FigureÂ 4-4 make it fairly easy to get a general sense of the relative ratios for the community districts, but precision isnât easily obtained, is it? Can you tell the exact ratio for Manhattan community district #1? I canât. If knowing the precise amounts is a task our audience will want to perform, weâll need to think about how to make it easy for them to accomplish that task.

We could increase precision by dragging the Recycle to Refuse Ratio from the Measures area to the Label shelf (or Ctrl-selecting AGG(Recycle to Refuse Ratio) and dragging it from the Columns shelf to the Label shelf to duplicate). The resulting bar chart grid with labels is shown in FigureÂ 4-5.

FigureÂ 4-5.Â Adding labels to the bar chart grid

Thatâs a lot of decimal places, isnât it? Our audience probably doesnât need to know the ratio to the fourth decimal place (ten-thousandths). We can change the number of digits shown by right-clicking the Recycle to Refuse Ratio in the Measures panel, selecting Default Properties, then Number Format, and choosing Number (Custom). In the resulting dialog box, weâll change the decimal places to three, as shown in FigureÂ 4-6.

FigureÂ 4-6.Â Specifying the number of decimal places in the Measures panel

Notice that the labels in the bar chart grid change accordingly.

What we really have here is a table that has been augmented by adding bars with lengths proportional to the values in each cell. Another way to show this precise comparison is by getting rid of the bars and instead coloring each cell by the ratioâa highlight table.

Letâs make a highlight table by starting with a new sheet. Ctrl-click Borough, CommunityDistrict, and Recycle to Refuse Ratio, and then open the Show Me panel and select highlight table. Tableau creates the view shown in FigureÂ 4-7.

FigureÂ 4-7.Â The default highlight table for Recycle to Refuse Ratio

Notice that Tableau created a highlight table thatâs âflippedâ compared to our bar chart grid shown in FigureÂ 4-5 (Borough is in Rows instead of Columns, and CommunityDistrict is in Columns instead of Rows). To flip it to match, weâll just click the Swap button in the toolbar, , resulting in the taller table shown in FigureÂ 4-8.

FigureÂ 4-8.Â The highlight table with Rows and Columns swapped

Tableau has defaulted to a sequential green color palette, meaning the gradient increases proportionately with increasing Recycle to Refuse Ratio. Our eyes can easily pick out the darkest cell, and we also find the lightest rather easily. Most people can distinguish between about six different levels of intensity. If the goal were to communicate the precise values, we could have created a simple table without the highlights. Adding the highlights makes it easier to quickly compare cells and find the highest and lowest values.

Itâs still not immediately obvious which community district has the second-highest ratio, though. Our eyes have to dart around to all the dark cells and read and memorize the values. Eventually we settle on Brooklyn community district #6 as having the second-highest ratio, but the task gets harder as we move along. Which has the third highest? The fourth? Is there an easier way to facilitate this assessment of rank?

What weâd like is a simple list of community districts in descending order of the ratio. To make this list, letâs first create a new field that combines the Borough and CommunityDistrict fields into a single Dimension.

Ctrl-click Borough and CommunityDistrict so they are both selected, then right-click on one of the two selected fields (either will do fine) and select Combine Fields. A new string field will appear in the Dimensions area called Borough & CommunityDistrict (Combined).

Create the list shown in FigureÂ 4-9 by dragging Borough & CommunityDistrict to the Rows shelf, Recycle to Refuse Ratio to the Columns shelf, and Borough to the Color shelf, sorting in descending order.

FigureÂ 4-9.Â Bar chart of community districts

Now we can quickly tell which are the top four or five districts, and which are the bottom four or five. Using the colored bars, we also can see some patterns, like that the districts with the lowest ratio are all in the Bronx (blue), and most of the top ten are in Manhattan (green).

But if I asked you to pick out the tenth or the fifteenth districts, youâd have to count down from the top, right? Not the easiest task to perform. How can we add rank to this list to easily pick out the order from 1 to 59?

Two Ways of Adding Rank

Letâs create a Rank field. To do so, right-click anywhere in the Dimensions or Measures panel on the left and select Create a Calculated Field once again. This time, name the field Rank and enter INDEX(). As soon as the message appears in the bottom left indicating âthe calculation is valid,â blue text will also appear in the top right that says âDefault Table Calculation.â Click on this blue text and change the Compute using drop-down to Borough & CommunityDistrict (Combined), as shown in FigureÂ 4-10.

FigureÂ 4-10.Â Creating a rank for each of the 59 community districts

This tells Tableau that the rank should apply to the combination of Borough and CommunityDistrict.

After clicking OK twice, we now see a new field in the Measures panel called Rank. Notice that this field is continuous (the # symbol is green), but we can consider the rankings to be discrete integers, because there is no rank between any two successive ranks. To change the Rank data field type, right-click on it in the Measures panel and select Convert to Discrete. The # symbol next to Rank will turn blue, indicating that itâs discrete. Weâre now ready to add Rank to the chart.

To add the Rank field to the bar chart, simply drag it to the left of Borough & CommunityDistrict (Combined) on the Rows shelf. After removing the row dividers, we have the colored, sorted, and now ranked bar chart shown in FigureÂ 4-11.

FigureÂ 4-11.Â Adding the Rank field to the bar chart

Alternatively, since the launch of Version 8.1, users can create a similar Rank field by making use of the new Rank Table Calculation. To do so, drag another instance of Recycle to Refuse Ratio to the columns shelf, click in the down arrow of the new pill, and change it to Discrete, and then click the down arrow again and select Quick Table Calculation and then Rank. Finally, click the down arrow once more and select Compute using, then choose Borough CommunityDistrict (combined). (This calcuated field also lets you handle ties in different ways.)

Now the only task this chart doesnât facilitate is gleaning precise values. We could add the labels to the right of the bars as we did in FigureÂ 4-5 by simply dragging the Ratio field to the Label shelf, but perhaps a better approach would be to add the labels to the left of the bars so as not to interfere with the perception of the barsâ lengths.

Because Recycle to Refuse Ratio is a continuous data field (the # symbol is green), if we just drag it out onto the Rows shelf to the right of Borough & CommunityDistrict, Tableau will create 59 y-axes for each of the bars. Try it and see what I mean.

We just want the ratio value itself to appear, so weâll first convert Recycle to Refuse Ratio to a discrete field the same way we converted Rank, and then weâll drag the ratio onto the Rows shelf to the far right, leaving a blue pill and creating the updated view shown in FigureÂ 4-12.

FigureÂ 4-12.Â Precise values added to the left of the bars

Weâve now visualized the ratio in a way that makes it easy for our audience to perform a number of different tasks very quickly:

Get a general sense of how the 59 community districts compare relative to one another
Determine which districts have the highest ratios, and which have the lowest
Get a general sense of which boroughs have districts at the top of the list, and which have districts at the bottom
Know the precise ratio for any district
Pick out any district in rank order (e.g., the 12^th or the 28^th)

This view is a very versatile and useful way to communicate the ratio of recycle to refuse. But it doesnât tell us which community districts produced more trash overall per person. For all we know, the districts at the top may be recycling a lot, but they may also be producing a much higher amount per person. Thatâs where rates come into play.

Rates

Recall that a rate is just a special type of ratio in which the numerator and the denominator have different units. In the previous example, the ratio of recycle to refuse involved terms with the same unitsânamely, tons. When we consider the trash production rate per person, our numerator is still tons, but now our denominator is the number of people.

But the data set we have been working with so far doesnât include community district population. In order to determine trash production per person, weâll need to find a way to include population data in our analysis.

Going back to the data discovery process in FigureÂ 1-1, weâve gone all the way around the horse track, and now we have a brand-new question that involves gathering new data.

Ideally, we would find population by community district for September 2011, which is when the DSNY collection data was taken. We canât find that, but we came come close: 2010 census data puts us within a yearâs time of the trash collection data, and is probably suitable to get a reasonable approximation of the trash production rate.

Now that we found population data, how can we create rates with it? Luckily, Tableau allows us to connect our current workbook to this new data set and âblendâ it with the existing data in the workbook.

Blending Data Sources

From the Data menu of our current workbook, select Connect to Data and then Microsoft Excel, and browse to the population data file, connecting to the correct sheet in the file and importing the data as an extract. We now see the new data source appear in the Data area along with the DSNY data, as shown in FigureÂ 4-13.

FigureÂ 4-13.Â Adding a second data source to a workbook

The newly imported data set is highlighted in the Data area, and the fields for that data set are shown in the Dimensions and Measures panels below. If we click on the DSNY data set, the data fields for that data set will show below.

Notice that CommunityDistrict is in the Dimensions panel of the DSNY set, but itâs in the Measures area of the new population data source. Letâs match them up by dragging CommunityDistrict from the Measures panel of the population table into the Dimensions area.

Because both data sources now have a Dimension called Borough and a Dimension called CommunityDistrict, Tableau will link them together for us. If the fields to link had different names, we would have to manually link them using Data â Edit Relationships.

Visualizing Rates

Now that we have linked these two data sources, our updated Tableau workbook contains both terms of the refuse per person ratio: the measurement of the amount of refuse in tons (the numerator) and the count of the number of people in each community district (the denominator). Weâll create the rate the same way we created the ratio in the previous section, but now we need to create a calculated field that includes fields from two different sources, as shown in FigureÂ 4-14.

FigureÂ 4-14.Â Creating a rate using fields from two data sources

Using the SUM aggregation type for both terms in the rate means that we can get meaningful rates for boroughs as well as community districts. Also, notice we are converting from short tons (US) to pounds (lbs.) by multiplying the numerator by 2,000âwhich is how many pounds are in each ton. On a per person basis, itâs much easier for us to think in terms of pounds, so this is a better unit to use for this comparison.

Now that we have created the rate, we can visualize it. Weâll follow the same steps that we used to create the ratio bar chart in FigureÂ 4-12 to create the rate comparison bar chart shown in FigureÂ 4-15.

FigureÂ 4-15.Â Comparing rates in a bar chart

Itâs clear from this view that the three community districts in Staten Island produced the most refuse per person in September 2011, at a rate of more than 70 pounds per person.

Warning

A word of caution about these results: often, data visualization and data analysis are best used to propose new questions to ask. Data is great at helping us make comparisons, but it doesnât always answer âwhy?â and âhow?â It can be easy for us to jump to conclusions based on what the data shows, but some further qualitative investigation is frequently required.

Why are the Staten Island community districts at the top of this chart? The chart itself doesnât tell us. It helps us formulate the next question. Does this mean every person who lives on Staten Island produced 70 to 80 pounds of trash that month? No, thatâs not what the data shows. The data just shows how much trash was collected from their communities. DSNY provided overall refuse collected, not just for residences, so perhaps there are industries or businesses on Staten Island that produced a lot of trash.

The point is to be careful with what we communicate, and not to communicate more than the data gives us license to say.

Summary

In this chapter, we learned how to use Calculated Fields and blended data sources to create ratios (quotients) and rates (quotients with mixed units). We also learned the Index function as well as the Rank table calculation, and created highlight tables and bar charts to compare different dimensionsâat both the borough and community district levels. This enabled us to go beyond overall amounts (âhow muchâ) and to consider relative amounts (âhow much perâ).

In the next chapter, weâll consider another type of normalized comparison: proportions and percentages.

Get Communicating Data with Tableau now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial