Installing and configuring HDFS in cluster mode

First of all, for all master nodes (name node and secondary name node) and slaves, you need to enable keyless SSH entry in both directions, as described in previous sections. Similarly, you will need a Java environment on all of the available nodes, as most of Hadoop is based on Java itself.

When you add nodes to your cluster, you need to copy all of your configuration and your Hadoop folder. The same applies to all components of Hadoop, including HDFS, YARN, MapReduce, and so on.

It is a good idea to have a shared network drive with access to all hosts, as this will enable easier file sharing. Alternatively, you can write a simple shell script to make multiple copies using SCP. So, create a ...

Get Apache Hadoop 3 Quick Start Guide now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.