Chapter 5. Conclusion
In this report, we described what a modern data catalog is, how to think about incorporating it into your modern data tech stack, and the business value you should expect from it. We provided a framework to understand the characteristics of a data catalog by three aspects, as described in “A Framework to Characterize Data Catalogs”:
- Broad connectivity
-
Data catalogs with broad connectivity have flexible and extensible data models. They capture metadata and represent not only data assets in an enterprise, but related entities, such as metrics, charts, AI features, and users. Catalogs with broad connectivity are designed to easily integrate with other systems in an enterprise. They expose their internal services via open and expressive APIs to allow for further extensibility.
- Intelligence
-
Intelligence allows catalogs to go beyond capturing only explicit metadata. Intelligence enables catalogs to incorporate human knowledge, both passively (by tracking human usage and popularity of assets) and actively (by crowdsourcing tribal knowledge and incorporating users’ feedback). These catalogs employ advanced techniques, such as machine learning and NLP to enrich collected metadata, extract links and relationships, and infer implicit and missing information.
- Active data governance
-
Active governance guides users as they find and use data. A data catalog with active governance will surface compliance information about sensitive data at point of use, so as to encourage ...