May 2026
Intermediate to advanced
350 pages
2h 50m
English
Errors that are caught late often require expensive re-runs of jobs, making catching errors early key to high-performance data analytics. Testing, Validation, and Side-By-Side comparisons are three key techniques for stopping errors before they hit production. These errors can lead to real-world impacts, like production outages, which more than half of people have experienced (see Figure 5-11), with almost 20% experiencing serious outages from their Spark pipelines. We don’t recommend hope as a strategy when it comes to software quality; for one thing, the odds are not great (worse than half).2 Instead, in this chapter, you will learn how to identify and stop many kinds of errors before they impact your users.3