© The Author(s), under exclusive license to APress Media, LLC, part of Springer Nature 2021
H. LuuBeginning Apache Spark 3https://doi.org/10.1007/978-1-4842-7383-8_5

5. Optimizing Spark Applications

Hien Luu1  
(1)
SAN JOSE, CA, USA
 

Chapter 4 covered major capabilities in Spark SQL to perform simple to complex data processing. When you use Spark to process large datasets in hundreds of gigabytes or terabytes, you encounter interesting and challenging performance issues; therefore, it is important to know how to deal with them. Mastering Spark application performance issues is a very interesting, challenging, and broad topic. It requires a lot of research and a deep understanding of some of the key areas of Spark related to memory management and data ...

Get Beginning Apache Spark 3: With DataFrame, Spark SQL, Structured Streaming, and Spark Machine Learning Library now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.