Skip to Content
Data Engineering with Databricks Cookbook
book

Data Engineering with Databricks Cookbook

by Pulkit Chadha
May 2024
Beginner to intermediate
438 pages
9h 41m
English
Packt Publishing
Content preview from Data Engineering with Databricks Cookbook

6

Performance Tuning with Apache Spark

Apache Spark is a powerful and versatile framework for large-scale data processing. It offers high-level APIs in Scala, Java, Python, and R, as well as low-level access to the Spark core engine. Spark supports a variety of workloads, such as batch processing, streaming, machine learning, graph analytics, and SQL queries. However, to get the most out of Spark, you need to know how to optimize its performance and avoid common pitfalls.

In this chapter, you will learn how to performance-tune Apache Spark applications.

We will cover the following recipes in this chapter:

  • Monitoring Spark jobs in the Spark UI
  • Using broadcast variables
  • Optimizing Spark jobs by minimizing data shuffling
  • Avoiding data skew
  • Caching ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Fundamentals of Data Engineering

Fundamentals of Data Engineering

Joe Reis, Matt Housley
Fundamentals of Data Engineering

Fundamentals of Data Engineering

Joe Reis, Matt Housley

Publisher Resources

ISBN: 9781837633357Supplemental Content