Data catalogs are not new; in fact they’ve been around for decades. But in the age of data lakes, self-service analytics, and data protection regulation, they’ve taken on new capabilities and renewed importance. There are a number of products in the market now, and they differ greatly, with a number of subcategories in the space.
Some data catalogs focus on data discoverability, others on governance and security. Some are oriented toward relational databases and data warehouses, while others are tied to more modern data sources. Many of the products use AI and machine learning to help automate the catalog build out, but almost all of them do so differently. And, while there are several startups in the field, public cloud providers have entries here, as do incumbent software megavendors.
Andrew Brust (Blue Badge Insights | ZDNet) guides you through the importance of data catalogs, covers the range of data catalog capabilities, and explores the key players and their platforms. Andrew also provides an analysis of where the space is headed and what it will need to provide to address customer needs and pain points. You’ll get up to speed on the subject quickly, and no prior data catalog knowledge is required.
- A basic understanding of databases, data files, and data types
- General knowledge of data warehouses and data lakes (useful but not required)
What you'll learn
- Discover concepts of data catalogs including schema, metadata, data classification, business glossaries, tagging, data set endorsement, personally identifiable information (PII), sensitive data protection, regulatory compliance, data marketplaces, and more
- Explore the role of machine learning and AI in catalog automation and relationship discovery
This session is from the 2019 O'Reilly Strata Conference in New York, NY.
- Title: Executive Briefing: Data catalogs—Concepts, capabilities, and key platforms
- Release date: February 2020
- Publisher(s): O'Reilly Media, Inc.
- ISBN: 0636920372295
You might also like
Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow, 2nd Edition
Through a series of recent breakthroughs, deep learning has boosted the entire field of machine learning. …
Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow, 3rd Edition
Through a recent series of breakthroughs, deep learning has boosted the entire field of machine learning. …
The Self-Service Data Roadmap
Data-driven insights are a key competitive advantage for any industry today, but deriving insights from raw …
Designing Data-Intensive Applications
Data is at the center of many challenges in system design today. Difficult issues need to …