O'Reilly logo

PySpark Cookbook by Tomasz Drabas, Denny Lee

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

How it works...

As with the previous recipes, we will first specify where we are going to download the Spark binaries from and create all the relevant global variables we are going to use later. 

Next, we read in the hosts.txt file:

function readIPs() { input="./hosts.txt"
 driver=0 executors=0 _executors=""  IFS='' while read line do
 if [[ "$master" = "1" ]]; then    _driverNode="$line"    driver=0 fi
 if [[ "$slaves" = "1" ]]; then   _executors=$_executors"$line\n" fi
 if [[ "$line" = "driver:" ]]; then    driver=1    executors=0 fi
 if [[ "$line" = "executors:" ]]; then    executors=1    driver=0 fi
 if [[ -z "${line}" ]]; then     continue fi done < "$input"}

We store the path to the file in the input variable. The driver and the executors variables are flags ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required