O'Reilly logo

Learning PySpark by Denny Lee, Tomasz Drabas

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Deploying the app programmatically

Unlike the Jupyter notebooks, when you use the spark-submit command, you need to prepare the SparkSession yourself and configure it so your application runs properly.

In this section, we will learn how to create and configure the SparkSession as well as how to use modules external to Spark.

Note

If you have not created your free account with either Databricks or Microsoft (or any other provider of Spark) do not worry - we will be still using your local machine as this is easier to get us started. However, if you decide to take your application to the cloud it will literally only require changing the --master parameter when you submit the job.

Configuring your SparkSession

The main difference between using Jupyter ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required