February 2019
Beginner to intermediate
382 pages
10h 1m
English
A Spark program can run by itself or over cluster managers. The first option is similar to running a program locally with multiple threads, and one thread is considered one Spark job worker. Of course, there is no parallelism at all, but it is a quick and easy way to launch a Spark application, and we will be deploying it in this model by way of demonstration, throughout the chapter. For example, we can run the following script to launch a Spark application:
./bin/spark-submit examples/src/main/python/pi.py
This is precisely as we did in the previous section. Or, we can specify the number of threads:
./bin/spark-submit --master local[4] examples/src/main/python/pi.py
In the previous code, we run Spark ...