Chapter 3. Data Intended for Human Consumption, Not Machine Consumption

Paul Murrell

This chapter describes issues that can arise when a dataset has been provided in a format that is designed mainly for consumption by human eyeballs.

Data is typically provided this way in order to allow a human to extract a particular message from the data.

The problem is that we inevitably end up wanting to do more with the data, which means working with the data using software, which means explaining the format of the data to the software, which in turn means that we end up wishing that the data were formatted for consumption by a computer, not human eyeballs.

The Data

The main high school qualification in New Zealand is called NCEA (National Certificate of Educational Achievement). A typical student will attempt to gain NCEA Level 1 in Year 11 (their eleventh year of formal education), NCEA Level 2 in Year 12, and Level 3 in Year 13. However, it is also possible for students to attempt NCEA levels in earlier years or to gain an NCEA level in a later year if they fail at the first attempt.

This leads to statistics on the number (or percentage) of students who have attained each level of NCEA by the end of each year of formal education (see Example 3-1).

Example 3-1. Number of students gaining NCEA in 2010 by level and year

               Year 11 Year 12 Year 13
NCEA (Level 1)   41072   46629   40088
NCEA (Level 2)    1050   37513   38209
NCEA (Level 3)      91     451   24688

The Problem: Data Formatted for Human Consumption

Tables of ...

Get Bad Data Handbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.