In this chapter, it’s time to dig a little deeper into Python tricks that’ll make your life easier. We’ll revisit a lot of topics that we’ve already talked about, but take them a step further. First up, we’ll remind ourselves of why this is important.
After we’ve reacquainted ourselves with ETL, we’ll look into the Spark UI and how that tool can help us monitor what’s happening in the system when we run a query. Then we’ll take a deep dive into a lot of new functions and features available in Pyspark.
Finally, we’ll look at how to handle data stored on the file ...