Building a Data Web from Open Data and Free Services

A large part of the art of performing and communicating science is in designing processes that remove inaccurate or misleading results, to provide a body of evidence that clearly supports a simplifying explanation that humans can understand. Science can be seen as the process of reducing pieces of the world into intelligible models. Part of the problem of this approach is the tendency to oversimplify to either strengthen an argument or, in the case of very complex systems, just to make it comprehensible.

Our approach is to embrace the complexity of real measurements by making all the detail available. We aim to balance the issues that this complexity creates with the need for clear and useful data sets by filtering the primary record in as transparent a way as possible to create the primary data set. The availability of storage space on the Web at near zero cost and the wide availability of high-quality, freely hosted services makes it possible to host the whole of the research record in the public view. There is simply no longer any excuse for writing "data not shown." But the desire to provide access to the full record creates new problems.

The first of these is simply volume. The research record itself tends to be a large body of largely unstructured text and images. Widespread standards do not exist to represent this type of information in a way that is easily parsed by either humans or machines. Summaries and filtering are ...

Get Beautiful Data now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.