Chapter 5. (Re)Organizing the Webâs Data
The first, and sometimes hardest part of doing any data analysis is acquiring the data from which you hope to extract information. Whether you want to look at your personal spending habits, calculate your next trade in fantasy baseball, or compare a politicianâs investment returns to your own, the data you need is usually there on the web with some sense of order to it, but itâs probably not in a form thatâs very useful for analysis. If this is the case, youâll need to either manually gather the data or write a script to collect the data for you.
The granddaddy of all data formats is the data table, with a column
for each attribute and a row for each observation. Youâve seen this if
youâve ever used Microsoft Excel, relational databases, or Râs data.frame
object.
Table 5-1. An example data table
Date | Blog | Posts |
---|---|---|
2012-01-01 | adamlaiacano | 2 |
2012-01-01 | david | 4 |
2012-01-01 | dallas | 6 |
2012-01-02 | adamlaiacano | 0 |
2012-01-02 | david | 4 |
2012-01-02 | dallas | 6 |
Most websites store their data behind the scenes in tables within relational databases, and if those tables were accessible to the computing public, this chapter of Bad Data Handbook wouldnât need to exist. However, itâs a web designerâs job to make this information visually appealing and interpretable, which usually means theyâll only present the reader with a relevant subset of the dataset, such as a single companyâs stock price over a specific date range, or recent status updates from ...
Get Bad Data Handbook now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.