Time for action – intentionally causing missing blocks
The next step should be obvious; let's kill three DataNodes in quick succession.
Tip
This is the first of the activities we mentioned that you really should not do on a production cluster. Although there will be no data loss if the steps are followed properly, there is a period when the existing data is unavailable.
The following are the steps to kill three DataNodes in quick succession:
- Restart all the nodes by using the following command:
$ start-all.sh
- Wait until Hadoop
dfsadmin -report
shows four live nodes. - Put a new copy of the test file onto HDFS:
$ Hadoop fs -put file1.data file1.new
- Log onto three of the cluster hosts and kill the DataNode process on each.
- Wait for the usual 10 minutes ...
Get Hadoop: Data Processing and Modelling now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.