Chapter 2. Organize Data: Design a Robust Architecture for Search

You can’t discover and map the entire world in a day. Likewise, you can’t discover and map the IT landscape of your organization in one go. As you discovered in Chapter 1, the effectiveness of a data catalog is heavily dependent on how well it is structured and managed—this impacts how well it can be searched. Although it may sound straightforward, organizing your assets and their metadata isn’t simple. You will have to ask yourself: what is the most logical way to group data? What’s the most relevant metadata for my data assets? How do my data assets relate? Can they relate in multiple ways? And what is the interplay between how confidential data is, and how sensitive it is?

In this chapter, we’ll go through these kinds of questions and walk through the process of gathering and organizing the assets in a data catalog. We’ll begin with how to organize domains, proceed to a brief discussion of how you populate the domains with data, and end with how to organize your data once it’s represented in the data catalog.

Let’s first have a look at how you organize domains.

Organizing Domains in the Data Catalog

As I discussed in Chapter 1, a domain groups assets that logically belong together. Accordingly, the first thing you need to do is to create your domains. You do not need to create them all at once, just the ones you need to begin to push/pull your first data sources. Once you have that, you can then organize the ...

Get The Enterprise Data Catalog now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.