© Robert Ilijason  2020
R. IlijasonBeginning Apache Spark Using Azure Databrickshttps://doi.org/10.1007/978-1-4842-5781-4_7

7. The Power of Python

Robert Ilijason1 
(1)
Viken, Sweden
 

Python has quickly become one of the most important tools in the data science and data engineering communities. This chapter digs deeper into how you can use this language together with the Apache Spark DataFrames API to work with data in an efficient way.

We’ll discuss what Python is and quickly look at how the language is structured. We will revisit DataFrames and learn how to play around with data using some of the built-in features.

With our datasets in place, we’ll learn how to pick up data, filter the information we want, and run different functions to get the results ...

Get Beginning Apache Spark Using Azure Databricks: Unleashing Large Cluster Analytics in the Cloud now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.