Chapter 16. Developing Spark Applications

In Chapter 15, you learned about how Spark runs your code on the cluster. We’ll now show you how easy it is to develop a standalone Spark application and deploy it on a cluster. We’ll do this using a simple template that shares some easy tips for how to structure your applications, including setting up build tools and unit testing. This template is available in the book’s code repository. This template is not really necessary, because writing applications from scratch isn’t hard, but it helps. Let’s get started with our first application.

Writing Spark Applications

Spark Applications are the combination of two things: a Spark cluster and your code. In this case, the cluster will be local mode and the application will be one that is pre-defined. Let’s walk through an application in each language.

A Simple Scala-Based App

Scala is Spark’s “native” language and naturally makes for a great way to write applications. It’s really no different than writing a Scala application.


Scala can seem intimidating, depending on your background, but it’s worth learning if only to understand Spark just a bit better. Additionally, you do not need to learn all the language’s ins and outs; begin with the basics and you’ll see that it’s easy to be productive in Scala in no time. Using Scala will also open up a lot of doors. With a little practice, it’s not to difficult to do code-level tracing through Spark’s codebase.

You can build applications using ...

Get Spark: The Definitive Guide now with the O’Reilly learning platform.

O’Reilly members experience live online training, plus books, videos, and digital content from nearly 200 publishers.