O'Reilly logo

Learning Spark, 2nd Edition by Tathagata Das, Brooke Wenig, Denny Lee, Jules Damji

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Chapter 2. Downloading Apache Spark and Getting Started

In this chapter, we will focus on the ease of downloading Spark 2.x and walk through three simple steps for you as a developer—whether a data scientist or data engineer—to get started writing your first standalone application.

We will use local mode as one of the preferred ways to easily learn Spark, showing you the quick feedback loop for iteratively performing Spark operations. Using Spark shells is a quick way to prototype or test Spark operations with small datasets before writing a complex Spark application.

While the Spark shell only supports Scala and Python, you can write a Spark application with any of the supported languages—Scala, Java, Python, and R—along with issuing queries in Spark SQL; we do expect some familiarity with the language of your choice to use Spark.

Step 1: Download Apache Spark

First, go to http:/spark.apache.org/downdloads.html and select from the pull-down menu “Prebuilt For Hadoop 2.7 and later” and click on link 3: “Download Spark” (Fig. 2-1).

Download page on spark.apache.org
Figure 2-1. Download page on spark.apache.org

This will download the tarball spark-2.4.0-bin-hadoop2.7.tgz. While this has all the Hadoop-related binaries you will need to run Spark in the local mode on your laptop, you can select the matching Hadoop version from the pulldown menu if you are going to install it on an existing HDFS or Hadoop installation. ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required