Broadcast variables are variables shared by the driver node; that is, the node running the IPython notebook in our configuration, with all the nodes in the cluster. It's a read-only variable, as the variable is broadcast by one node and never read back if another node changes it.
Let's now see how it works in a simple example: we want to one-hot encode a dataset containing just gender information as a string. The dummy dataset contains just a feature that can be male M, female F, or unknown U (if the information is missing). Specifically, we want all the nodes to use the defined one-hot encoding, as listed in the following dictionary:
In: one_hot_encoding = {"M": (1, 0, 0), "F": (0, 1, 0), "U": (0, 0, 1)}
In ...