July 2017
Intermediate to advanced
796 pages
18h 55m
English
collect() simply collects all elements in the RDD and sends it to the Driver.
Shown here is an example showing what collect function essentially does. When you call collect on an RDD, the Driver collects all the elements of the RDD by pulling them into the Driver.
Shown below is the code to collect the content of the RDD and display it:
scala> rdd_two.collectres25: Array[String] = Array(Apache Spark provides programmers with an application programming interface centered on a data structure called the resilient distributed dataset (RDD), a read-only multiset of data items distributed over a cluster of machines, that is maintained in a fault-tolerant way., ...
Read now
Unlock full access