One viable option for organizations looking to harness massive amounts of data is the data lake, a single repository for storing all the raw data, both structured and unstructured, that floods into the company. But that isn’t the end of the story. The key to making a data lake work is data governance, using metadata to provide valuable context through tagging and cataloging.
This practical report examines why metadata is essential for managing, migrating, accessing, and deploying any big data solution. Authors Federico Castanedo and Scott Gidley dive into the specifics of analyzing metadata for keeping track of your data—where it comes from, where it’s located, and how it’s being used—so you can provide safeguards and reduce risk. In the process, you’ll learn about methods for automating metadata capture.
This report also explains the main features of a data lake architecture, and discusses the pros and cons of several data lake management solutions that support metadata. These solutions include:
- Traditional data integration/management vendors such as the IBM Research Accelerated Discovery Lab
- Tooling from open source projects, including Teradata Kylo and Informatica
- Startups such as Trifacta and Zaloni that provide best of breed technology
Table of contents
1. Understanding Metadata: Create the Foundation for a Scalable Data Architecture
- Key Challenges of Building Next-Generation Data Architectures
- What Is Metadata and Why Is It Critical in Today’s Data Environment?
- A Modern Data Architecture—What It Looks Like
- Automating Metadata Capture
- Title: Understanding Metadata
- Release date: March 2017
- Publisher(s): O'Reilly Media, Inc.
- ISBN: 9781491974889
You might also like
Head First Design Patterns, 2nd Edition
You know you don’t want to reinvent the wheel, so you look to design patterns—the lessons …
Architecting Modern Data Platforms
There’s a lot of information about big data technologies, but splicing these technologies into an end-to-end …
Designing Data-Intensive Applications
Data is at the center of many challenges in system design today. Difficult issues need to …
Architecting Data Lakes, 2nd Edition
Many organizations today are succeeding with data lakes, not just as storage repositories but as places …