Skip to Content
Managing the Data Lake
book

Managing the Data Lake

by Andy Oram
September 2015
Beginner to intermediate
15 pages
31m
English
O'Reilly Media, Inc.
Content preview from Managing the Data Lake

Chapter 1. Moving to Big Data Analysis

Can you tell by sailing the surface of a lake whether it has been well maintained? Can local fish and plants survive? Dare you swim? And how about the data maintained in your organization’s data lake? Can you tell whether it’s healthy enough to support your business needs?

An increasing number of organizations maintain fast-growing repositories of data, usually from multiple sources and formatted in multiple ways, that are commonly called “data lakes.” They use a variety of storage and processing tools—especially in the Hadoop family—to extract value quickly and inform key organizational decisions.

This report looks at the common needs that modern organizations have for data management and governance. The MapReduce model—introduced in 2004 in a paper1 by Jeffrey Dean and Sanjay Ghemawat—completely overturned the way the computing community approached big data analysis. Many other models, such as Spark, have come since then, creating excitement and seeing eager adoption by organizations of all sizes to solve the problems that relational databases were not suited for. But these technologies bring with them new demands for organizing data and keeping track of what you’ve got.

I take it for granted that you understand the value of undertaking a big data initiative, as well as the value of a framework such as Hadoop, and are in the process of transforming the way you manage your organization’s data. I have interviewed a number of experts in data ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

AirBnbBlueOriginElectronic ArtsHomeDepotNasdaqRakutenTata Consultancy Services

QuotationMarkO’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.
Julian F.
Head of Cybersecurity
QuotationMarkI wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.
Addison B.
Field Engineer
QuotationMarkI’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.
Amir M.
Data Platform Tech Lead
QuotationMarkI'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.
Mark W.
Embedded Software Engineer

You might also like

Data Lake for Enterprises

Data Lake for Enterprises

Vivek Mishra, Tomcy John, Pankaj Misra
Operationalizing the Data Lake

Operationalizing the Data Lake

Holden Ackerman, Jon King

Publisher Resources

ISBN: 9781492049876Errata Page