July 2017
Intermediate to advanced
796 pages
18h 55m
English
PySpark uses Python-based SparkContext and Python scripts as tasks and then uses sockets and pipes to executed processes to communicate between Java-based Spark clusters and Python scripts. PySpark also uses Py4J, which is a popular library integrated within PySpark that lets Python interface dynamically with Java-based RDDs.
The following is how PySpark works by communicating between Java processed and Python scripts:

Read now
Unlock full access