May 2017
Intermediate to advanced
270 pages
6h 18m
English
Setting up PySpark from scratch requires the installation of the Java and Scala runtimes, the compilation of the project from source, and the configuration of Python and Jupyter notebook so that they can be used alongside the Spark installation. An easier and less error-prone way to set up PySpark is to use an already configured Spark cluster made available through a Docker container.
To set up a Spark cluster, it is sufficient to go in this chapter's code files (where a file named Dockerfile is located) and issue the following command:
$ docker build -t pyspark
This command ...
Read now
Unlock full access