July 2018
Intermediate to advanced
334 pages
8h 20m
English
We downloaded the Wisconsin Breast Cancer data file into the Chapter2 folder and renamed it bcw.csv. The process of DataFrame creation starts with loading the data.
We will invoke the read method on SparkSession as follows:
scala> val dfReader1 = spark.read dfReader1: org.apache.spark.sql.DataFrameReader = org.apache.spark.sql.DataFrameReader@3d9dc84d
The read method that has been returned produces DataFrameReader. Because our dataset is a CSV file, we want to tell Spark about it by invoking the format method on DataFrameReader by passing in the com.databricks.spark.csv format specifier string:
scala> val dfReader2 = dfReader1.format("com.databricks.spark.csv")dfReader2: org.apache.spark.sql.DataFrameReader ...Read now
Unlock full access