July 2017
Beginner to intermediate
350 pages
8h 23m
English
In Chapter 5, Working with Data and Storage, we read CSV using SparkSession in the form of a Java RDD. However, this time we will read the CSV in the form of a dataset. Consider, you have a CSV with the following content:
emp_id,emp_name,emp_dept1,Foo,Engineering2,Bar,Admin
The SparkSession can be used to read this CSV file as follows:
Dataset<Row> csv = sparkSession.read().format("csv").option("header","true").load("C:\\Users\\sgulati\\Documents\\my_docs\\book\\testdata\\emp.csv");
Similarly to the collect() function on RDD, a dataset provides the show() function, which can be used to read the content of the dataset:
csv.show();
Executing this function will show the content of the CSV files along with ...
Read now
Unlock full access