Executing the balancer command for uniform data distribution

Data in HDFS may not always be placed uniformly. There can be numerous reasons for this. One of the major reasons is the addition of new nodes to the cluster. In such a case, it's the Hadoop administrator's job to make sure that they execute the balancer command to rebalance the data load.

Getting ready

To perform this recipe, you should have performed earlier recipes.

How to do it...

In the previous recipe, we added a new node to the cluster while the other three nodes were already part of the cluster. When you execute the dfsadmin report command, you would have noticed that the data is not uniformly balanced because of the addition of a new node. In my case, here is the state of the new ...

Get Hadoop: Data Processing and Modelling now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.