Read-only broadcast variables

Broadcast variables are variables shared by the driver node; that is, the node running the IPython notebook in our configuration, with all the nodes in the cluster. It's a read-only variable, as the variable is broadcast by one node and never read back if another node changes it.

Let's now see how it works in a simple example: we want to one-hot encode a dataset containing just gender information as a string. The dummy dataset contains just a feature that can be male M, female F, or unknown U (if the information is missing). Specifically, we want all the nodes to use the defined one-hot encoding, as listed in the following dictionary:

In: one_hot_encoding = {"M": (1, 0, 0), "F": (0, 1, 0),                        "U": (0, 0, 1)}

In ...

Get Python Data Science Essentials - Third Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.