November 2016
Beginner to intermediate
941 pages
21h 55m
English
When we're working on a distributed environment, sometimes it is required to share information across nodes so that all the nodes can operate using consistent variables. Spark handles this case by providing two kinds of variables: read-only and write-only variables. By not ensuring that a shared variable is both readable and writable anymore, it also drops the consistency requirement, letting the hard work of managing this situation fall on the developer's shoulders. Usually, a solution is quickly reached as Spark is really flexible and adaptive.
Broadcast variables are variables shared by the driver node, that is, the node running the IPython Notebook in our configuration, with ...