July 2017
Intermediate to advanced
796 pages
18h 55m
English
Now we will show you how to predict the possibility of breast cancer with step-by-step example:
Step 1: Load and parse the data
val rdd = spark.sparkContext.textFile("data/wbcd.csv") val cancerRDD = parseRDD(rdd).map(parseCancer)
The parseRDD() method goes as follows:
def parseRDD(rdd: RDD[String]): RDD[Array[Double]] = { rdd.map(_.split(",")).filter(_(6) != "?").map(_.drop(1)).map(_.map(_.toDouble)) }
The parseCancer() method is as follows:
def parseCancer(line: Array[Double]): Cancer = { Cancer(if (line(9) == 4.0) 1 else 0, line(0), line(1), line(2), line(3), line(4), line(5), line(6), line(7), line(8)) }
Note that here we have simplified the dataset. For the value 4.0, we have converted them ...
Read now
Unlock full access