Book description
Organizations across many industries have recently created fast-growing repositories to deal with an influx of new data from many sources and often in multiple formats. To manage these data lakes, companies have begun to leave the familiar confines of relational databases and data warehouses for Hadoop and various big data solutions. But adopting new technology alone won’t solve the problem.
Based on interviews with several experts in data management, author Andy Oram provides an in-depth look at common issues you’re likely to encounter as you consider how to manage business data. You’ll explore five key topic areas, including:
- Acquisition and ingestion: how to solve these problems with a degree of automation.
- Metadata: how to keep track of when data came in and how it was formatted, and how to make it available at later stages of processing.
- Data preparation and cleaning: what you need to know before you prepare and clean your data, and what needs to be cleaned up and how.
- Organizing workflows: what you should do to combine your tasks—ingestion, cataloging, and data preparation—into an end-to-end workflow.
- Access control: how to address security and access controls at all stages of data handling.
Andy Oram, an editor at O’Reilly Media since 1992, currently specializes in programming. His work for O'Reilly includes the first books on Linux ever published commercially in the United States.
Publisher resources
Product information
- Title: Managing the Data Lake
- Author(s):
- Release date: September 2015
- Publisher(s): O'Reilly Media, Inc.
- ISBN: 9781491941676
You might also like
book
Operationalizing the Data Lake
Big data and advanced analytics have increasingly moved to the cloud as organizations pursue actionable insights …
book
Data Lake for Enterprises
A practical guide to implementing your enterprise data lake using Lambda Architecture as the base Key …
book
Data Management at Scale
As data management and integration continue to evolve rapidly, storing all your data in one place, …
book
The Enterprise Big Data Lake
The data lake is a daring new approach for harnessing the power of big data technology …