In this chapter, we will focus on the ease of downloading Spark 2.x and walk through three simple steps for you as a developer—whether a data scientist or data engineer—to get started writing your first standalone application.
We will use local mode as one of the preferred ways to easily learn Spark, showing you the quick feedback loop for iteratively performing Spark operations. Using Spark shells is a quick way to prototype or test Spark operations with small datasets before writing a complex Spark application.
While the Spark shell only supports Scala and Python, you can write a Spark application with any of the supported languages—Scala, Java, Python, and R—along with issuing queries in Spark SQL; we do expect some familiarity with the language of your choice to use Spark.
First, go to http:/spark.apache.org/downdloads.html and select from the pull-down menu “Prebuilt For Hadoop 2.7 and later” and click on link 3: “Download Spark” (Fig. 2-1).
This will download the tarball spark-2.4.0-bin-hadoop2.7.tgz. While this has all the Hadoop-related binaries you will need to run Spark in the local mode on your laptop, you can select the matching Hadoop version from the pulldown menu if you are going to install it on an existing HDFS or Hadoop installation. ...