Martin, Fernanda, and I came to this project in the mindset of researchers: we wanted to understand how best to support social interaction in the visual analysis process. Our choice of data set was not predetermined, though it was clear that a good data domain would satisfy some specific properties: we wanted a large, real-world data set, relevant to a general audience, and rich enough to warrant many different analyses. According to these criteria, census data seemed ideal. I had also long been interested in making census data more publicly accessible: I believe it is an important lens through which we might better understand ourselves and our history.

I started by rummaging through the U.S. census bureau's website ( This proved only mildly productive. The census bureau provides a number of data sets at various levels of aggregation (e.g., by zip code, metro area, region), but this rich data is only available for recent census decades. I also realized that I was in a bit over my head. I had much to learn about the ins and outs of how census data has been collected and modeled over the decades. For example, the questions and categories used by the census bureau have evolved over the decades, meaning that even if one has data for every year, it does not guarantee that the data can be easily compared.

In general, one should not dive into visualization design before gaining at least a basic familiarity with the data domain. So my next step was to meet with ...

Get Beautiful Data now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.