Chapter 3. Implementing a Data Catalog

The myriad innovations and novel concepts in data analytics over the last few years have birthed several conceptual frameworks and architectural approaches. From big data to data lakes, data mesh to data fabric, the field is evolving rapidly. These are concepts with definitions still being defined and debated by pundits, researchers, and vendors, yet they are already shaping data analytics practices in many enterprises.

In this chapter, we briefly discuss a few of these recent architectures and concepts as well as how a data catalog can support each of them. We then close the chapter with a number of recommendations for successful implementation of an enterprise data catalog.

Data Catalog in an Enterprise Data Stack

In the next few sections, we discuss four popular recent concepts in data analytics architecture—data lakes, the modern data stack, data mesh, and data fabric.

Enterprise Data Lakes

A data lake is a centralized repository that allows enterprises to store all structured and unstructured data at any scale. Data lakes are characterized by open-ended schema-on-use data and agile development.

Many enterprises have adopted data lakes as an alternative or complement to traditional data warehouses. Given growing numbers of data sources and the increasing importance of data analyses, a data lake allows analysts to easily ingest and transform data at a rapid pace. Furthermore, a data lake contains many data sources within an organization ...

Get Implementing a Modern Data Catalog to Power Data Intelligence now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.