© Raju Kumar Mishra and Sundar Rajan Raman 2019
Raju Kumar Mishra and Sundar Rajan RamanPySpark SQL Recipeshttps://doi.org/10.1007/978-1-4842-4335-0_5

5. Data Merging and Data Aggregation Using PySparkSQL

Raju Kumar Mishra1  and Sundar Rajan Raman2
(1)
Bangalore, Karnataka, India
(2)
Chennai, Tamil Nadu, India
 
Data merging and data aggregation are an essential part of the day-to-day activities of PySparkSQL users. This chapter will discuss and describe the following recipes.
  • Recipe 5-1. Aggregate data on a single key

  • Recipe 5-2. Aggregate data on multiple keys

  • Recipe 5-3. Create a contingency table

  • Recipe 5-4. Perform joining operations on two DataFrames

  • Recipe 5-5. Vertically stack two DataFrames

  • Recipe 5-6. Horizontally stack two DataFrames

  • Recipe 5-7. Perform ...

Get PySpark SQL Recipes: With HiveQL, Dataframe and Graphframes now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.