book

Architecting Data Lakes, 2nd Edition

Name: Architecting Data Lakes, 2nd Edition
Author: Ben Sharma
ISBN: 9781492032991

by Ben Sharma

April 2018

Beginner to intermediate

55 pages

1h 15m

English

O'Reilly Media, Inc.

Read now

Unlock full access

1. Overview
Succeeding with Big DataDefinition of a Data LakeThe Differences Between Data Warehouses and Data LakesThe Business Case for Data LakesDrawbacks of Data LakesSucceeding with Big Data
2. Designing Your Data Lake
Cloud, On-Premises, Multicloud, or HybridData Storage and RetentionData Lake ProcessingData Lake Management and GovernanceAdvanced Analytics and Enterprise ReportingThe Zaloni Data Lake Reference ArchitectureZone 1: The Transient Landing ZoneZone 2: The Raw ZoneZone 3: The Trusted ZoneZone 4: The Refined ZoneThe Sandbox
3. Curating the Data Lake
Integrating Data ManagementData IngestionData GovernanceData CatalogCapturing MetadataData PrivacyStorage Considerations via Data Life Cycle ManagementData PreparationBenefits of an Integrated Approach
4. Deriving Value from the Data Lake
The ExecutiveThe Data ScientistThe Business AnalystThe Downstream SystemSelf-ServiceControlling AccessCrowdsourcingData Lakes in Different IndustriesHealth and Life SciencesFinancial ServicesTelecommunicationsRetail
5. Looking Ahead
Logical Data LakesFederated QueriesEnterprise Data MarketplacesMachine Learning and Intelligent Data LakesThe Internet of ThingsIn ConclusionA Checklist for SuccessBusiness-Benefit Priority ListArchitectural OversightSecurity StrategyI/O and Memory ModelWorkforce Skillset EvaluationOperations PlanDisaster Recovery PlanCommunications PlanFive-Year Vision

Content preview from Architecting Data Lakes, 2nd Edition

Chapter 2. Designing Your Data Lake

Determining what technologies to employ when building your data lake stack is a complex undertaking. You must consider storage, processing, data management, and so on. Figure 2-1 shows the relationships among these tasks.

Cloud, On-Premises, Multicloud, or Hybrid

In the past, most data lakes resided on-premises. This has undergone a tremendous shift recently, with most companies looking to the cloud to replace or augment their implementations.

Whether to use on-premises or cloud storage and processing is a complicated and important decision point for any organization. The pros and cons to each could fill a book and are highly dependent on the individual implementation. Generally speaking, on-premises storage and processing offers tighter control over data security and data privacy, whereas public cloud systems offer highly scalable and elastic storage and computing resources to meet enterprises’ need for large scale processing and data storage without having the overheads of provisioning and maintaining expensive infrastructure.

Also, with the rapidly changing tools and technologies in the ecosystem, we have also seen many examples of cloud-based data lakes used as the incubator for dev/test environments to evaluate all the new tools and technologies at a rapid pace before picking the right one to ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

O’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.

Julian F.

Head of Cybersecurity

I wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.

Addison B.

Field Engineer

I’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.

Amir M.

Data Platform Tech Lead

I'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.

Mark W.

Embedded Software Engineer

Publisher Resources

ISBN: 9781492033004

Cloud Computing

Data Engineering

Data Science

AI & ML

Programming Languages

Software Architecture

IT/Ops

Security

Design

Business

Soft Skills

Architecting Data Lakes, 2nd Edition

by Ben Sharma

Chapter 2. Designing Your Data Lake

Figure 2-1. The data lake technology stack

Cloud, On-Premises, Multicloud, or Hybrid

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.