7Linked Data Principles for Data Lakes

Linked Data are based on a set of principles and technologies to exploit the architecture of the Web in order to represent and provide access to machine-readable, globally integrated information. Those principles and technologies have many advantages when applied in the context of implementing data lakes, both generally and in particular domains.

This chapter provides an overview of what Linked Data means, and of the general approach to create and consume Linked Data resources. It is shown how this approach can be used at different levels in a data lake, including basic graph-based data storage and querying, data integration and data cataloging. To exemplify the application of Linked Data principles and technologies for data lakes, a demonstrating scenario is given in the context of the creation and application of a large data platform for a smart city: the MK Data Hub.

7.1. Basic principles

The simplest notion of Linked Data, as indicated in this capitalized case, is that of a paradigm for representing and publishing data of any kind: a paradigm that is comprised of a set of design principles, which are, in turn, supported by a set of core technologies. If someone publishes a dataset and in so doing:

  • – uses unique identifiers to reference each entity in the data;
  • – makes it possible for those identifiers to be looked up by anyone;
  • – uses standards to represent the information that is returned by looking up those identifiers;
  • – or ensures ...

Get Data Lakes now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.