Broadcasting
A broadcast variable enables a Spark developer to keep a read-only copy of an instance or class variable cached on each driver program, rather than transferring a copy of its own with the dependent tasks. However, an explicit creation of a broadcast variable is useful only when tasks across multiple stages need the same data in deserialize form.
In Spark application development, using the broadcasting option of SparkContext can reduce the size of each serialized task greatly. This also helps to reduce the cost of initiating a Spark job in a cluster. If you have a certain task in your Spark job that uses large objects from the driver program, you should turn it into a broadcast variable.
To use a broadcast variable in a Spark ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access