O'Reilly logo

Scala Data Analysis Cookbook by Arun Manivannan

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Chapter 6. Scaling Up

In this chapter, we will cover the following recipes:

  • Building the Uber JAR
  • Submitting jobs to the Spark cluster (local)
  • Running the Spark standalone cluster on EC2
  • Running the Spark job on Mesos (local)
  • Running the Spark job on YARN (local)

Introduction

In this chapter, we'll be looking at how to bundle our Spark application and deploy it on various distributed environments.

As we discussed earlier in Chapter 3, Loading and Preparing Data – DataFrame the foundation of Spark is the RDD. From a programmer's perspective, the composability of RDDs such as a regular Scala collection is a huge advantage. RDD wraps three vital (and two subsidiary) pieces of information that help in reconstruction of data. This enables fault tolerance. ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required