Appendix C. Finding Data

In general, there are four main “sources” of data that you can turn to when you are trying to answer a question about the world. I put “sources” in quotes because these are really types of sources, not specific websites, databases, or even organizations. Instead, these represent the mechanisms that journalists, researchers, and other professionals use to collect data about the world in order to answer their questions.

Data Repositories and APIs

“Open data” access has increasingly become a feature of many governmental and scientific organizations, in an effort to improve transparency, accountability, and—especially in the scientific community—reproducibility. This means that many government agencies, nonprofit organizations, and scientific journals, for example, maintain websites where you can access structured data relevant to their work. For example, a simple web search for “nyc open data” or “baltimore open data” will bring you to those cities’ “open data” portals; a similar search for “johannesburg open data” will bring you first to the South African Cities Open Data Almanac (SCODA) website, but a few links down you’ll find more datasets from an organization called “DataFirst” as well as the South African Data Portal hosted at http://opendataforafrica.org. All of these sites will have some data—though as we discussed in Chapter 3, the quality of that data—including its appropriateness for answering your particular question—can vary widely.

APIs are ...

Get Practical Python Data Wrangling and Data Quality now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.