Chapter 2. Types of Data Catalogs

In this chapter, we look at the different types of data catalogs. The goal of this chapter is not to categorize data catalogs into separate groups, but rather to provide a simple framework for how a data catalog’s focus influences the main three characteristics we talked about in the first chapter: broad connectivity, intelligence, and active governance.

The three main types we discuss here are tool-adjunct data catalogs, domain-specific data catalogs, and data catalog platforms.

Tool-Adjunct Data Catalogs

Tool-adjunct data catalogs are built as part of an existing tool. Typically, these catalogs are not part of a tool’s main offering, but an add-on to enhance a user’s experience or to extend the tool’s functionalities.

An early example of such catalogs is the internal data catalog of relational database management systems. This catalog stores metadata essential for operating the database. Examples of metadata include a list of tables, columns in each table, and list of views. Although the main focus for this data is internal operations, relational databases typically expose this metadata to users. They also support a limited set of metadata that is meant for human consumption, such as descriptions of tables and fields.

Tool-adjunct data catalogs have evolved to focus more on human users in addition to internal operations. Hive Metastore, for example, aims to facilitate discovery of data in the Hadoop ecosystem. It supports custom metadata such ...

Get Implementing a Modern Data Catalog to Power Data Intelligence now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.