Chapter 5. Identifying Job Issues
Distributed computing is great, until it isn’t. The trick is understanding when your jobs have moved from great to not so great. This chapter is all about when things go wrong and all the ways that they can go wrong. I continue to focus on a single job in this chapter; in Chapter 6, we examine failures in the bigger picture of data pipelines.
My goal is that you learn to look out for different patterns of failure and stress in your jobs and in your designs. You will find that a number of the failures are a result of design-time decisions. This insight might help you build better jobs, or at least can tell you when you have one of these issues and how you might go about resolving it in a future rewrite. This chapter should help you spot issues before they manifest into problems that affect your customers—and your career.
Bottlenecks
Consider a stretch of highway that is jammed with traffic, crawling along at 5 miles an hour. That backup will slowly extend back for miles and cause frustration for hundreds if not thousands of people who had nothing to do with whatever caused the bottleneck.
In data processing, the only difference is that the expectations and impact on others can be many magnitudes more painful.
Bottlenecks can easily bring a data processing job or even a data pipeline to its knees with no mercy, bringing good code that has passed unit tests and run successfully for weeks and months to a grinding halt. To better understand the nature ...
Get Rebuilding Reliable Data Pipelines Through Modern Tools now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.