July 2018
Intermediate to advanced
334 pages
8h 20m
English
We will get started by invoking flatMap, by passing a function block to it, and successive transformations listed as follows, eventually resulting in Array[(org.apache.spark.ml.linalg.Vector, String)]. A vector represents a row of feature measurements.
The Scala code to give us Array[(org.apache.spark.ml.linalg.Vector, String)] is as follows:
//Each line in the RDD is a row in the Dataset represented by a String, which we can 'split' along the new //line characterval result2: RDD[String] = result1.flatMap { partition => partition.split("\n").toList }//the second transformation operation involves a split inside of each line in the dataset where there is a //comma separating ...Read now
Unlock full access