July 2017
Intermediate to advanced
796 pages
18h 55m
English
Creating a broadcast variable can be done using the Spark Context's broadcast() function on any data of any data type provided that the data/variable is serializable.
Let's look at how we can broadcast an Integer variable and then use the broadcast variable inside a transformation operation executed on the executors:
scala> val rdd_one = sc.parallelize(Seq(1,2,3))rdd_one: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[101] at parallelize at <console>:25scala> val i = 5i: Int = 5scala> val bi = sc.broadcast(i)bi: org.apache.spark.broadcast.Broadcast[Int] = Broadcast(147)scala> bi.valueres166: Int = 5scala> rdd_one.take(5)res164: Array[Int] = Array(1, 2, 3)scala> rdd_one.map(j => j + bi.value).take(5)res165: ...
Read now
Unlock full access