PySpark

In order to use the Spark functionalities (or PySpark, which contains the Python APIs of Spark), we need to instantiate a special object named SparkContext. It tells Spark how to access the cluster, and contains some application-specific parameters. In the Jupyter notebook provided in the virtual machine, this variable is already available and is called sc (it's the default option when an IPython Notebook is started); let's see what it contains in the next section.

Get Python Data Science Essentials - Third Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.