PySpark

In order to use the Spark functionalities (or PySpark, which contains the Python APIs of Spark), we need to instantiate a special object named SparkContext. It tells Spark how to access the cluster, and contains some application-specific parameters. In the Jupyter notebook provided in the virtual machine, this variable is already available and is called sc (it's the default option when an IPython Notebook is started); let's see what it contains in the next section.

Get Python Data Science Essentials - Third Edition now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.