Chapter 16. Beautifying Data in the Real World

Jean-Claude Bradley

Rajarshi Guha

Andrew Lang

Pierre Lindenbaum

Cameron Neylon

Antony Williams

Egon Willighagen

The Problem with Real Data

THERE ARE AT LEAST TWO PROBLEMS WITH COLLECTING "BEAUTIFUL DATA" IN THE REAL WORLD AND presenting it to the interested public. The first is that the universe is inherently noisy. In most cases collecting the same piece of data twice will not give the same answer. This is because the collection process can never be made completely error-free. Fluctuations of temperature, pressure, humidity, power sources, water or reagent quality, precision of weighing, or human error will all conspire to obscure the "correct" answer. The art in experimental measurement lies in designing the data collection process so as to minimize the degree to which random variation and operator error confuse the results. In the best cases this involves a careful process of refining the design of the experiment, monitoring size and source of errors. In the worst case it leads to people repeating experiments until they get the answer they are expecting.

The traditional experimental approach to dealing with the uncertainty created by errors is to repeat the experiment and subject the results to statistical analysis. Examples of repetition can be found in most issues of most scientific journals by looking for a figure panel that contains the text "typical results are shown." "Typical results" is generally taken to mean "the best data set we ...

Get Beautiful Data now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.