Chapter 1. Introduction to High Performance Spark
This chapter provides an overview of what we hope you will be able to learn from this book and does its best to convince you to learn to read some Scala and consider writing your Spark jobs in Scala or Python.
Feel free to skip ahead to Chapter 2 if you already know what you’re looking for.
What Is Spark and Why Performance Matters
ASF (currently) stands for Apache Software Foundation, although there are calls to rename the foundation. Spark is a high-performance, general-purpose distributed computing system that has become the most active ASF open source project, with more than 1,000 active contributors.1
Spark enables us to process large quantities of data, beyond what can fit on a single machine, with a high-level, relatively easy-to-use API. Spark’s design and interface are unique, and it is one of the fastest systems ...