Skip to Content
Learning Spark
book

Learning Spark

by Holden Karau, Andy Konwinski, Patrick Wendell, Matei Zaharia
February 2015
Intermediate to advanced
276 pages
7h 18m
English
O'Reilly Media, Inc.
Content preview from Learning Spark

Chapter 7. Running on a Cluster

Introduction

Up to now, we’ve focused on learning Spark by using the Spark shell and examples that run in Spark’s local mode. One benefit of writing applications on Spark is the ability to scale computation by adding more machines and running in cluster mode. The good news is that writing applications for parallel cluster execution uses the same API you’ve already learned in this book. The examples and applications you’ve written so far will run on a cluster “out of the box.” This is one of the benefits of Spark’s higher level API: users can rapidly prototype applications on smaller datasets locally, then run unmodified code on even very large clusters.

This chapter first explains the runtime architecture of a distributed Spark application, then discusses options for running Spark in distributed clusters. Spark can run on a wide variety of cluster managers (Hadoop YARN, Apache Mesos, and Spark’s own built-in Standalone cluster manager) in both on-premise and cloud deployments. We’ll discuss the trade-offs and configurations required for running in each case. Along the way we’ll also cover the “nuts and bolts” of scheduling, deploying, and configuring a Spark application. After reading this chapter you’ll have everything you need to run a distributed Spark program. The following chapter will cover tuning and debugging applications.

Spark Runtime Architecture

Before we dive into the specifics of running Spark on a cluster, it’s helpful to understand ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Learning Spark, 2nd Edition

Learning Spark, 2nd Edition

Jules S. Damji, Brooke Wenig, Tathagata Das, Denny Lee
Learning PySpark

Learning PySpark

Tomasz Drabas, Denny Lee
Spark: The Definitive Guide

Spark: The Definitive Guide

Bill Chambers, Matei Zaharia
High Performance Spark, 2nd Edition

High Performance Spark, 2nd Edition

Holden Karau, Adi Polak, Rachel Warren

Publisher Resources

ISBN: 9781449359034Errata PageSupplemental Content