March 2019
Beginner to intermediate
182 pages
4h 6m
English
In this section, we will be covering text data, but in a tabular format—CSV. The following topics will be covered:
Saving CSV files is even more involved than JSON and plain text because we need to specify whether we want to retain headers of our data in our CSV file.
First, we will create a DataFrame:
test("should save and load CSV with header") { //given import spark.sqlContext.implicits._ val rdd = spark.sparkContext .makeRDD(List(UserTransaction("a", 100), UserTransaction("b", 200))) .toDF()
Then, we will use the write format CSV. We also need to specify that we don't want to include the header option in it:
//whenrdd.coalesce(1) .write .format("csv") .option("header", ...Read now
Unlock full access