Skip to Content
High Performance Spark, 2nd Edition
book

High Performance Spark, 2nd Edition

by Holden Karau, Adi Polak, Rachel Warren
May 2026
Intermediate to advanced
350 pages
2h 50m
English
O'Reilly Media, Inc.
Content preview from High Performance Spark, 2nd Edition

Chapter 5. Testing, Validation, and Side-By-Side runs

Errors that are caught late often require expensive re-runs of jobs, making catching errors early key to high-performance data analytics. Testing, Validation, and Side-By-Side comparisons are three key techniques for stopping errors before they hit production. These errors can lead to real-world impacts, like production outages, which more than half of people have experienced (see Figure 5-11), with almost 20% experiencing serious outages from their Spark pipelines. We don’t recommend hope as a strategy when it comes to software quality; for one thing, the odds are not great (worse than half).2 Instead, in this chapter, you will learn how to identify and stop many kinds of errors before they impact your users.3

hps2e 1101
Figure ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Learning Spark, 2nd Edition

Learning Spark, 2nd Edition

Jules S. Damji, Brooke Wenig, Tathagata Das, Denny Lee

Publisher Resources

ISBN: 9781098145842Errata Page