Installing and configuring HDFS in cluster mode

First of all, for all master nodes (name node and secondary name node) and slaves, you need to enable keyless SSH entry in both directions, as described in previous sections. Similarly, you will need a Java environment on all of the available nodes, as most of Hadoop is based on Java itself.

When you add nodes to your cluster, you need to copy all of your configuration and your Hadoop folder. The same applies to all components of Hadoop, including HDFS, YARN, MapReduce, and so on.

It is a good idea to have a shared network drive with access to all hosts, as this will enable easier file sharing. Alternatively, you can write a simple shell script to make multiple copies using SCP. So, create a ...

Get Apache Hadoop 3 Quick Start Guide now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.