Python has quickly become one of the most important tools in the data science and data engineering communities. This chapter digs deeper into how you can use this language together with the Apache Spark DataFrames API to work with data in an efficient way.
We’ll discuss what Python is and quickly look at how the language is structured. We will revisit DataFrames and learn how to play around with data using some of the built-in features.
With our datasets in place, we’ll learn how to pick up data, filter the information we want, and run different functions to get the results ...