5 Data frame gymnastics: Joining and grouping
This chapter covers
- Joining two data frames together
- Selecting the right type of join for your use case
- Grouping data and understanding the
GroupedData
transitional object - Breaking the
GroupedData
with an aggregation method - Filling
null
values in your data frame
In chapter 4, we looked at how we can transform a data frame using selection, dropping, creation, renaming, reordering, and creating a summary of columns. Those operations constitute the foundation for working with a data frame in PySpark. In this chapter, I will complete the review of the most common operations you will perform on a data frame: linking or joining data frames, as well as grouping data (and performing operations on the
Get Data Analysis with Python and PySpark now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.