O'Reilly logo

Cloudera Administration Handbook by Rohit Menon

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Chapter 2. HDFS and MapReduce

We now have a basic understanding of the Apache Hadoop architecture and its inner workings. In this chapter, we will dive deeper into the two major components of Apache Hadoop—HDFS and MapReduce, and will cover the following topics:

  • Essentials of Hadoop Distributed File System
  • The read/write operational flow in HDFS
  • Exploring HDFS commands
  • Getting acquainted with MapReduce

Essentials of HDFS

HDFS is a distributed filesystem that has been designed to run on top of a cluster of industry standard hardware. The architecture of HDFS is such that there is no specific need for high-end hardware. HDFS is a highly fault-tolerant system and can handle failures of nodes in a cluster without loss of data. The primary goal behind the ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required