Chapter 2. Upgrading Spark

When we started writing the second edition of this book, one of the first tasks we had to face was upgrading our examples from Spark 2.2 to Spark 3.3. In our day jobs, we also often face the task of helping people upgrade to new versions of Spark. Upgrading to new versions of Spark is important to be able to take advantage of its many performance improvements; some of these can be as simple as making your code run on the new engine whereas in other cases, you may need to use newer APIs. In this chapter you will learn about how to identify areas of Spark that have changed and where you may need to update your codebase.

Upgrading to newer versions of Spark is not as simple as bumping the version and basking in the joy of a new engine. While Spark officially aims to follow SemVer (semantic versioning), where it maintains API compatibility within ...

Get High Performance Spark, 2nd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.