Chapter 6. When Things Break

One of the main promises of Hadoop is resilience to failure and an ability to survive failures when they do happen. Tolerance to failure will be the focus of this chapter.

In particular, we will cover the following topics:

  • How Hadoop handles failures of DataNodes and TaskTrackers
  • How Hadoop handles failures of the NameNode and JobTracker
  • The impact of hardware failure on Hadoop
  • How to deal with task failures caused by software bugs
  • How dirty data can cause tasks to fail and what to do about it

Along the way, we will deepen our understanding of how the various components of Hadoop fit together and identify some areas of best practice.


With many technologies, the steps to be taken when things go wrong are rarely covered ...

Get Hadoop Beginner's Guide now with the O’Reilly learning platform.

O’Reilly members experience live online training, plus books, videos, and digital content from nearly 200 publishers.