Raju Kumar Mishra and Sundar Rajan RamanPySpark SQL Recipeshttps://doi.org/10.1007/978-1-4842-4335-0_7

7. Optimizing PySpark SQL

Raju Kumar Mishra¹ and Sundar Rajan Raman²

(1)

Bangalore, Karnataka, India

(2)

Chennai, Tamil Nadu, India

In this chapter, we look at various Spark SQL recipes that optimize SQL queries. Apache Spark is an open source framework that is developed with Big Data volumes in mind. It is supposed to handle huge volumes of data. It is supposed to be used in scenarios where there is a need for horizontal scaling for processing power. Before we cover the optimization techniques used in Apache Spark, you need to understand the basics of horizontal scaling and vertical scaling.

The term ...

Get PySpark SQL Recipes: With HiveQL, Dataframe and Graphframes now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

PySpark SQL Recipes: With HiveQL, Dataframe and Graphframes by Raju Kumar Mishra, Sundar Rajan Raman

7. Optimizing PySpark SQL

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly