What Hadoop is and why it is important

The Apache Hadoop is a collection of open source software that enables distributed storage and processing of large datasets across a cluster of different types of computer systems. The Apache Hadoop framework consists of the following four key modules:

  • Apache Hadoop Common
  • Apache Hadoop Distributed File System (HDFS)
  • Apache Hadoop MapReduce
  • Apache Hadoop YARN (Yet Another Resource Manager)

Each of these modules covers different capabilities of the Hadoop framework. The following diagram depicts their positioning in terms of applicability for Hadoop 3.X releases:

Apache Hadoop Common consists of shared ...

Get Apache Hadoop 3 Quick Start Guide now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.