O'Reilly logo

Apache Spark for Data Science Cookbook by Padma Priya Chitturi

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Using IPython with PySpark

As Python is the most preferred choice for data scientists due to its high-level syntax and extensive library of packages, Spark developers have considered it for data analysis. The PySpark API has been developed for working with RDDs in Python. IPython Notebook is an essential tool for data scientists to present the scientific and theoretical work in an interactive fashion, integrating both text and Python code.

This recipe shows how to configure IPython with PySpark and also focuses on connecting the IPython shell to PySpark.

Getting ready

To step through this recipe, you need Ubuntu 14.04 (Linux flavor) installed on the machine. Python comes pre-installed. The python --version command gives the version of the Python ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required