Spark resource managers – Standalone, YARN, and Mesos

We have already executed spark applications in the Spark standalone resource manager in other sections of this chapter (within the PySpark shell and applications). Let's try to understand how these cluster resource managers are different from each other and when they should be used.

Local versus cluster mode

Before moving on to cluster resource managers, let's understand how cluster mode is different from local mode.

It is important to understand the scope and life cycle of variables and methods when executing code across a cluster. Let's look at an example with the foreach action:

counter = 0
rdd = sc.parallelize(data)
rdd.foreach(lambda x: counter += x)
print("Counter value: " + counter)

In ...

Get Big Data Analytics now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.