Skip to Content
Data Lake for Enterprises
book

Data Lake for Enterprises

by Vivek Mishra, Tomcy John, Pankaj Misra
May 2017
Beginner to intermediate
596 pages
15h 2m
English
Packt Publishing
Content preview from Data Lake for Enterprises

Data storage nodes (DataNode)

A Data node's primary role in a Hadoop cluster is to store data, and the jobs are executed as tasks on these nodes. The tasks are scheduled in a way that the batch job processing is done near the data by allocating tasks to those nodes which would be having the data for processing in most certainty. This also ensures that the batch jobs are optimized from execution perspectives and are performant with near data processing.

Please see the details and inner working of a typical Hadoop batch process here:

Figure 06: MapReduce in action

Here, we see that the job, when initiated, is divided into a number of mapper ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

The Enterprise Big Data Lake

The Enterprise Big Data Lake

Alex Gorelik
Operationalizing the Data Lake

Operationalizing the Data Lake

Holden Ackerman, Jon King
Data Lakes

Data Lakes

Anne Laurent, Dominique Laurent, Cédrine Madera

Publisher Resources

ISBN: 9781787281349Supplemental Content