High availability

The loss of NameNodes can crash the cluster in both Hadoop 1.x as well as Hadoop 2.x. In Hadoop 1.x, there was no easy way to recover, whereas Hadoop 2.x introduced high availability (active-passive setup) to help recover from NameNode failures.

The following diagram shows how high availability works:

In Hadoop 3.x you can have two passive NameNodes along with the active node, as well as five JournalNodes to assist with recovery from catastrophic failures:

  • NameNode machines: The machines on which you run the active and standby NameNodes. They should have equivalent hardware to each other and to what would be used in a non-HA ...

Get Big Data Analytics with Hadoop 3 now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.