July 2017
Intermediate to advanced
796 pages
18h 55m
English
You can also destroy broadcast variables, completely removing them from all executors and the Driver too making them inaccessible. This can be quite helpful in managing the resources optimally across the cluster.
Calling destroy() on a broadcast variable destroys all data and metadata related to the specified broadcast variable. Once a broadcast variable has been destroyed, it cannot be used again and will have to be recreated all over again.
The following is an example of destroying broadcast variables:
scala> val rdd_one = sc.parallelize(Seq(1,2,3))rdd_one: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[101] at parallelize at <console>:25scala> val k = 5k: Int = 5scala> val bk = sc.broadcast(k)bk: org.apache.spark.broadcast.Broadcast[Int] ...
Read now
Unlock full access