HDFS High Availability
HDFS is a Master-Slave cluster with the NameNode as the master and the 100s, if not 1000s of DataNodes as slaves, managed by the master node. This introduces a Single Point of Failure (SPOF) in the cluster as if the Master NameNode goes down for some reason, the entire cluster is going to be unusable. HDFS 1.0 supports an additional Master Node known as the Secondary NameNode to help with recovery of the cluster. This is done by maintaining a copy of all the metadata of the filesystem and is by no means a Highly Available System requiring manual interventions and maintenance work. HDFS 2.0 takes this to the next level by adding support for full High Availability (HA).
HA works by having two Name Nodes in an active-passive ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access