book

Managing the Data Lake

Name: Managing the Data Lake
Author: Andy Oram
ISBN: 9781491941676

by Andy Oram

September 2015

Beginner to intermediate

15 pages

31m

English

O'Reilly Media, Inc.

Content preview from Managing the Data Lake

Chapter 1. Moving to Big Data Analysis

Can you tell by sailing the surface of a lake whether it has been well maintained? Can local fish and plants survive? Dare you swim? And how about the data maintained in your organization’s data lake? Can you tell whether it’s healthy enough to support your business needs?

An increasing number of organizations maintain fast-growing repositories of data, usually from multiple sources and formatted in multiple ways, that are commonly called “data lakes.” They use a variety of storage and processing tools—especially in the Hadoop family—to extract value quickly and inform key organizational decisions.

This report looks at the common needs that modern organizations have for data management and governance. The MapReduce model—introduced in 2004 in a paper¹ by Jeffrey Dean and Sanjay Ghemawat—completely overturned the way the computing community approached big data analysis. Many other models, such as Spark, have come since then, creating excitement and seeing eager adoption by organizations of all sizes to solve the problems that relational databases were not suited for. But these technologies bring with them new demands for organizing data and keeping track of what you’ve got.

I take it for granted that you understand the value of undertaking a big data initiative, as well as the value of a framework such as Hadoop, and are in the process of transforming the way you manage your organization’s data. I have interviewed a number of experts in data ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

O’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.

Julian F.

Head of Cybersecurity

I wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.

Addison B.

Field Engineer

I’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.

Amir M.

Data Platform Tech Lead

I'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.

Mark W.

Embedded Software Engineer

Publisher Resources

ISBN: 9781492049876Errata Page

Cloud Computing

Data Engineering

Data Science

AI & ML

Programming Languages

Software Architecture

IT/Ops

Security

Design

Business

Soft Skills

Managing the Data Lake

by Andy Oram

Chapter 1. Moving to Big Data Analysis

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.