Time for action – killing a DataNode process
Firstly, we'll kill a DataNode. Recall that the DataNode process runs on each host in the HDFS cluster and is responsible for the management of blocks within the HDFS filesystem. Because Hadoop, by default, uses a replication factor of 3 for blocks, we should expect a single DataNode failure to have no direct impact on availability, rather it will result in some blocks temporarily falling below the replication threshold. Execute the following steps to kill a DataNode process:
- Firstly, check on the original status of the cluster and check whether everything is healthy. We'll use the
dfsadmin
command for this:$ Hadoop dfsadmin -report Configured Capacity: 81376493568 (75.79 GB) Present Capacity: 61117323920 ...
Get Hadoop: Data Processing and Modelling now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.