Skip to Content
Data Lakes
book

Data Lakes

by Anne Laurent, Dominique Laurent, Cédrine Madera
June 2020
Beginner to intermediate
244 pages
5h 40m
English
Wiley-ISTE
Content preview from Data Lakes

5A Use Case of Data Lake Metadata Management

To govern a data lake with a great volume of heterogeneous types of data, metadata management is mandatory to prevent the data lake from being turned into a data swamp which is invisible, incomprehensible and inaccessible to users. In this chapter, we present a use case of data lake metadata management, applied to the health-care field, which is particularly known by its heterogeneous sources of data.

We first present a more detailed data lake definition in comparison to the chapter dedicated to the data lake definition and its underlying data lake architecture, based on which we designed the metadata model. Second, we present a metadata classification pointing to the essential attributes adapted to the use case. Third, we introduce a conceptual model of metadata which considers different types: (i) structured, (ii) semi-structured and (iii) unstructured raw or processed data. Fourth, we validate our proposition with an implementation of the conceptual model which concerns two DBMSs (one relational database and one NoSQL database).

5.1. Context

The University Hospital Center (UHC) of Toulouse is the largest hospital center in the south of France. Approximately 4,000 doctors and 12,000 hospital staff ensure more than 280,000 stays and 850,000 consultations per year. The information system of the hospital stores all the patient data including medical images, biological results, textual hospital reports, PMSI (Programme de médicalisation ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Architecting Data Lakes

Architecting Data Lakes

Ashish Thusoo, Ben Sharma
Operationalizing the Data Lake

Operationalizing the Data Lake

Holden Ackerman, Jon King
Data Superstream: Data Lakes and Warehouses

Data Superstream: Data Lakes and Warehouses

Alistair Croll, Lena Hall, Vini Jaiswal, Einat Orr, Wannes Rosiers, Jessica Larson, Ryan Blue, Tejas Chopra
Data Lake Maturity Model

Data Lake Maturity Model

Scott Gidley, Andy Oram

Publisher Resources

ISBN: 9781786305855Purchase book