July 2017
Intermediate to advanced
796 pages
18h 55m
English
filter applies transformation function to input partitions to generate filtered output partitions in the output RDD.
The following snippet shows how we can filter an RDD of a text file to an RDD with only lines containing the word Spark:
scala> val rdd_two = sc.textFile("wiki1.txt")rdd_two: org.apache.spark.rdd.RDD[String] = wiki1.txt MapPartitionsRDD[8] at textFile at <console>:24scala> rdd_two.countres6: Long = 9scala> rdd_two.firstres7: String = Apache Spark provides programmers with an application programming interface centered on a data structure called the resilient distributed dataset (RDD), a read-only multiset of data items distributed over a cluster of machines, that is maintained in a fault-tolerant way.scala> val ...Read now
Unlock full access