July 2017
Intermediate to advanced
796 pages
18h 55m
English
map applies transformation function to input partitions to generate output partitions in the output RDD.
As shown in the following snippet, this is how we can map an RDD of a text file to an RDD with lengths of the lines of text:
scala> val rdd_two = sc.textFile("wiki1.txt")rdd_two: org.apache.spark.rdd.RDD[String] = wiki1.txt MapPartitionsRDD[8] at textFile at <console>:24scala> rdd_two.countres6: Long = 9scala> rdd_two.firstres7: String = Apache Spark provides programmers with an application programming interface centered on a data structure called the resilient distributed dataset (RDD), a read-only multiset of data items distributed over a cluster of machines, that is maintained in a fault-tolerant way.scala> val rdd_three ...Read now
Unlock full access