Before we delve into Datasets and data wrangling, let's take a broader view of the APIs; we will focus on the relevant functions we need. This will give us a firm foundation when we wrangle with data later in this chapter. Refer to the following diagram:
The preceding diagram shows the broader hierarchy of the
org.apache.spark.sql classes. Interestingly,
pyspark.sql mirrors this hierarchy, except for DataFrame, which is basically the Scala Dataset. What I like about the PySpark interface is that it is very succinct and crisp, offering the same power, performance, and functionality as Scala or Java. But Scala has more ...