6Master Data and Reference Data in Data Lake Ecosystems

The data lake relies more on the data-governance domain than the analytics domain in the information system. As data governance is key, such as the metadata strategy of avoiding the data lake to become a data swamp, the question is: what are the master data and reference data roles in the data lake architecture?

In this chapter, we first present the concepts of master data management (MDM) and reference data management, and then, we discuss their roles in the data lake concept and the values they bring.

Master data: according to the Gartner definition1: Master data management is a technology-enabled discipline in which business and IT work together to ensure the uniformity, accuracy, stewardship, semantic consistency and accountability of the enterprise official shared master data assets. Master data is the consistent and uniform set of identifiers and extended attributes that describes the core entities of the enterprise including customers, prospects, citizens, suppliers, sites, hierarchies and chart of accounts.

Reference data: reference data refers to the data residing in code tables or lookup tables. They are normally static code tables storing values such as city and state codes, zip codes, product codes, country codes and industry classification codes. Reference data has general characteristics. They are typically used in a read-only manner by operational, analytical and definitional systems. Reference data can ...

Get Data Lakes now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.