Book description
One viable option for organizations looking to harness massive amounts of data is the data lake, a single repository for storing all the raw data, both structured and unstructured, that floods into the company. But that isn’t the end of the story. The key to making a data lake work is data governance, using metadata to provide valuable context through tagging and cataloging.
This practical report examines why metadata is essential for managing, migrating, accessing, and deploying any big data solution. Authors Federico Castanedo and Scott Gidley dive into the specifics of analyzing metadata for keeping track of your data—where it comes from, where it’s located, and how it’s being used—so you can provide safeguards and reduce risk. In the process, you’ll learn about methods for automating metadata capture.
This report also explains the main features of a data lake architecture, and discusses the pros and cons of several data lake management solutions that support metadata. These solutions include:
- Traditional data integration/management vendors such as the IBM Research Accelerated Discovery Lab
- Tooling from open source projects, including Teradata Kylo and Informatica
- Startups such as Trifacta and Zaloni that provide best of breed technology
Table of contents
Product information
- Title: Understanding Metadata
- Author(s):
- Release date: March 2017
- Publisher(s): O'Reilly Media, Inc.
- ISBN: 9781491974889
You might also like
book
Data Model Patterns: A Metadata Map
Data Model Patterns: A Metadata Map not only presents a conceptual model of a metadata repository …
book
Data Governance: The Definitive Guide
As you move data to the cloud, you need to consider a comprehensive approach to data …
book
Deciphering Data Architectures
Data fabric, data lakehouse, and data mesh have recently appeared as viable alternatives to the modern …
book
Data Management at Scale, 2nd Edition
As data management continues to evolve rapidly, managing all of your data in a central place, …