Chapter 6. When Things Break
One of the main promises of Hadoop is resilience to failure and an ability to survive failures when they do happen. Tolerance to failure will be the focus of this chapter.
In particular, we will cover the following topics:
- How Hadoop handles failures of DataNodes and TaskTrackers
- How Hadoop handles failures of the NameNode and JobTracker
- The impact of hardware failure on Hadoop
- How to deal with task failures caused by software bugs
- How dirty data can cause tasks to fail and what to do about it
Along the way, we will deepen our understanding of how the various components of Hadoop fit together and identify some areas of best practice.
With many technologies, the steps to be taken when things go wrong are rarely covered ...