Skip to Content
Data Lakes
book

Data Lakes

by Anne Laurent, Dominique Laurent, Cédrine Madera
June 2020
Beginner to intermediate
244 pages
5h 40m
English
Wiley-ISTE
Content preview from Data Lakes

6Master Data and Reference Data in Data Lake Ecosystems

The data lake relies more on the data-governance domain than the analytics domain in the information system. As data governance is key, such as the metadata strategy of avoiding the data lake to become a data swamp, the question is: what are the master data and reference data roles in the data lake architecture?

In this chapter, we first present the concepts of master data management (MDM) and reference data management, and then, we discuss their roles in the data lake concept and the values they bring.

Master data: according to the Gartner definition1: Master data management is a technology-enabled discipline in which business and IT work together to ensure the uniformity, accuracy, stewardship, semantic consistency and accountability of the enterprise official shared master data assets. Master data is the consistent and uniform set of identifiers and extended attributes that describes the core entities of the enterprise including customers, prospects, citizens, suppliers, sites, hierarchies and chart of accounts.

Reference data: reference data refers to the data residing in code tables or lookup tables. They are normally static code tables storing values such as city and state codes, zip codes, product codes, country codes and industry classification codes. Reference data has general characteristics. They are typically used in a read-only manner by operational, analytical and definitional systems. Reference data can ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Architecting Data Lakes

Architecting Data Lakes

Ashish Thusoo, Ben Sharma
Operationalizing the Data Lake

Operationalizing the Data Lake

Holden Ackerman, Jon King
Data Superstream: Data Lakes and Warehouses

Data Superstream: Data Lakes and Warehouses

Alistair Croll, Lena Hall, Vini Jaiswal, Einat Orr, Wannes Rosiers, Jessica Larson, Ryan Blue, Tejas Chopra
Data Lake Maturity Model

Data Lake Maturity Model

Scott Gidley, Andy Oram

Publisher Resources

ISBN: 9781786305855Purchase book