book

Implementing a Modern Data Catalog to Power Data Intelligence

by Fadi Maali, Jason Lim

June 2022

Beginner to intermediate

38 pages

51m

English

O'Reilly Media, Inc.

Read now

Unlock full access

What Is in a Data Catalog?Data Catalog Features and Example ApplicationsA Framework to Characterize Data CatalogsSummary
Tool-Adjunct Data CatalogsBroad ConnectivityIntelligenceActive GovernanceDomain-Specific CatalogsBroad ConnectivityIntelligenceActive GovernanceData Catalog PlatformsBroad ConnectivityIntelligenceActive GovernanceSummary
Data Catalog in an Enterprise Data StackEnterprise Data LakesThe Modern Data StackData MeshData FabricSuccessful Implementation of Data CatalogsAccommodate Existing Workflows for Data UsersFocus on PeopleFocus on Business and Technical MetadataHave an Adoption PlanMeasure Adoption and Impact of the Data CatalogSummary
Catalog Business ImpactCatalog Use CasesSelf-Service Business IntelligenceData Governance and Guided Data UsageData OperationsCloud and Multicloud MigrationSummary

Content preview from Implementing a Modern Data Catalog to Power Data Intelligence

Chapter 3. Implementing a Data Catalog

The myriad innovations and novel concepts in data analytics over the last few years have birthed several conceptual frameworks and architectural approaches. From big data to data lakes, data mesh to data fabric, the field is evolving rapidly. These are concepts with definitions still being defined and debated by pundits, researchers, and vendors, yet they are already shaping data analytics practices in many enterprises.

In this chapter, we briefly discuss a few of these recent architectures and concepts as well as how a data catalog can support each of them. We then close the chapter with a number of recommendations for successful implementation of an enterprise data catalog.

Data Catalog in an Enterprise Data Stack

In the next few sections, we discuss four popular recent concepts in data analytics architecture—data lakes, the modern data stack, data mesh, and data fabric.

Enterprise Data Lakes

A data lake is a centralized repository that allows enterprises to store all structured and unstructured data at any scale. Data lakes are characterized by open-ended schema-on-use data and agile development.

Many enterprises have adopted data lakes as an alternative or complement to traditional data warehouses. Given growing numbers of data sources and the increasing importance of data analyses, a data lake allows analysts to easily ingest and transform data at a rapid pace. Furthermore, a data lake contains many data sources within an organization ...