Chapter 9. Troubleshooting

Throughout this book, the notion that Hadoop is a distributed system made up of layered services has been a repeating theme. The interaction between these layers is what makes a system like this so complex and so difficult to troubleshoot. With complex moving parts, interdependent systems, sensitivity to environmental conditions and external factors, and numerous potential causes for numerous potential conditions, Hadoop starts to look like the human body. We’ll treat it as such (with my apologies to the medical field as a whole).

Differential Diagnosis Applied to Systems

A significant portion of problems encountered with systems, in general, remain so because of improper diagnosis.[21] You cannot fix something when you don’t know what’s truly wrong. The medical field commonly uses a differential diagnosis process as a way of investigating symptoms and their likelihood in order to properly diagnose a patient with a condition. Differential diagnoses, for those of us without an MD, is essentially a knowledge- and data-driven process of elimination whereby tests are performed to confirm or reject a potential condition. In fact, this isn’t as exotic a concept as it initially sounds when you think about how our brains attack these kinds of problems. It does, however, help to formalize such an approach and follow it, especially when things go wrong and your boss is standing over your shoulder asking you every five minutes whether it’s fixed yet. When things go ...

Get Hadoop Operations now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.