O'Reilly logo

Mastering Apache Spark by Mike Frampton

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Importing and saving data

I wanted to add this section about importing and saving data here, even though it is not purely about Spark SQL, so I could introduce concepts such as Parquet and JSON file formats. This section also allows me to cover how to access and save data in loose text; as well as the CSV, Parquet and JSON formats, conveniently, in one place.

Processing the Text files

Using the Spark context, it is possible to load a text file into an RDD using the textFile method. Also, the wholeTextFile method can read the contents of a directory into an RDD. The following examples show how a file, based on the local file system (file://), or HDFS (hdfs://) can be read into a Spark RDD. These examples show that the data will be partitioned into ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required