Appendix A. Solutions to the exercises

This appendix contains the solutions to the exercises presented in the book. If you have not solved them, I encourage you to do so. Reading the API doc and searching in other chapters of the book is fair game, but merely reading the answers won’t do any good!

Unless specified, each code block assumes the following:

from pyspark.sql import SparkSession
import pyspark.sql.functions as F
import pyspark.sql.types as T
 
spark = SparkSession.builder.getOrCreate()

Chapter 2

Exercise 2.1

Eleven records. explode() generates one record for each element of each array of the exploded column. The numbers column contains two arrays, one with five elements, one with six: 5 + 6 = 11.

from pyspark.sql.functions import ...

Get Data Analysis with Python and PySpark now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.