Deepak Vohra, Practical Hadoop Ecosystem, 10.1007/978-1-4842-2199-0_2

2. HDFS and MapReduce

Deepak Vohra¹

(1)Apt 105, White Rock, British Columbia, Canada

Apache Hadoop is a distributed framework for storing and processing large quantities of data. Going over each of the terms in the previous statement, “distributed” implies that Hadoop is distributed across several (tens, hundreds, or even thousands) of nodes in a cluster. For “storing and processing ” means that Hadoop uses two different frameworks: Hadoop Distributed Filesystem (HDFS) for storage and MapReduce for processing. This is illustrated in Figure 2-1.

Figure 2-1. Apache ...

Get Practical Hadoop Ecosystem: A Definitive Guide to Hadoop-Related Frameworks and Tools now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Practical Hadoop Ecosystem: A Definitive Guide to Hadoop-Related Frameworks and Tools by Deepak Vohra

2. HDFS and MapReduce

Figure 2-1. Apache ...

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly