Chapter 6. When Things Break

One of the main promises of Hadoop is resilience to failure and an ability to survive failures when they do happen. Tolerance to failure will be the focus of this chapter.

In particular, we will cover the following topics:

How Hadoop handles failures of DataNodes and TaskTrackers
How Hadoop handles failures of the NameNode and JobTracker
The impact of hardware failure on Hadoop
How to deal with task failures caused by software bugs
How dirty data can cause tasks to fail and what to do about it

Along the way, we will deepen our understanding of how the various components of Hadoop fit together and identify some areas of best practice.

Failure

With many technologies, the steps to be taken when things go wrong are rarely covered ...

Get Hadoop Beginner's Guide now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Hadoop Beginner's Guide by Garry Turkington

Chapter 6. When Things Break

Failure

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly