Skip to Content
High Performance Spark, 2nd Edition
book

High Performance Spark, 2nd Edition

by Holden Karau, Adi Polak, Rachel Warren
May 2026
Intermediate to advanced
350 pages
2h 50m
English
O'Reilly Media, Inc.
Content preview from High Performance Spark, 2nd Edition

Chapter 1. Introduction to High Performance Spark

This chapter provides an overview of what we hope you will be able to learn from this book and does its best to convince you to learn to read some Scala and consider writing your Spark jobs in Scala or Python.

Feel free to skip ahead to Chapter 2 if you already know what you’re looking for.

What Is Spark and Why Performance Matters

ASF (currently) stands for Apache Software Foundation, although there are calls to rename the foundation. Spark is a high-performance, general-purpose distributed computing system that has become the most active ASF open source project, with more than 1,000 active contributors.1

Spark enables us to process large quantities of data, beyond what can fit on a single machine, with a high-level, relatively easy-to-use API. Spark’s design and interface are unique, and it is one of the fastest systems ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Learning Spark, 2nd Edition

Learning Spark, 2nd Edition

Jules S. Damji, Brooke Wenig, Tathagata Das, Denny Lee

Publisher Resources

ISBN: 9781098145842Errata Page