July 2018
Intermediate to advanced
334 pages
8h 20m
English
In this step, we will load the data again, but in a slightly different manner. The goal of this phase of the data analysis is to produce a DataFrame where the data has been read into an RDD[String]. First, we will need a path to the dataset:
scala> val dataSetPath = "C:\\Users\\Ilango\\Documents\\Packt\\DevProjects\\Chapter2\\"dataSetPath: String = C:\Users\Ilango\Documents\Packt\DevProjects\Chapter2\
We have just created dataSetpath. In the following code, we will pass the path to the dataset into the textFile method:
scala> val firstRDD = spark.sparkContext.textFile(dataSetPath + "\\bcw.csv")firstRDD: org.apache.spark.rdd.RDD[String] = C:\<<path to your dataset file>> MapPartitionsRDD[1] ...
Read now
Unlock full access