4Metadata in Data Lake Ecosystems

The notion of metadata has been used in information management long before the emergence of computer science, in fields related to documentation or cataloguing. In fact, this notion was first introduced in Greek libraries for storing and classifying manuscripts, based on descriptive tags [MFI 17].

Organizing information resources (i.e. objects in the real world, data, abstract concepts, services, etc.) has been the main motivation of describing, structuring and documenting them with metadata. According to the National Information Security Organization (NISO) [RIL 17], metadata have been shown to be relevant in data management because they provide data sources with a context, allowing users to understand the data in the short, medium or long term.

In this chapter, the following issues are addressed: what are metadata? What has been done in this field? Is metadata useful for data lakes?

4.1. Definitions and concepts

According to NISO [RIL 17], the notion of metadata is defined as structured information for describing, explaining, localizing and aiding the retrieval, use or management of an information resource. Metadata is often called “data about data” or “information about information”.

According to “Open Data Support”, metadata provide information giving some meaning to data (documents, images, datasets), concepts (e.g. classification schema) or real-world entities (e.g. people, organizations, locations, masterpieces, products) [DAL 13]. ...

Get Data Lakes now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.