I wanted to add this section about importing and saving data here, even though it is not purely about Spark SQL, so I could introduce concepts such as Parquet and JSON file formats. This section also allows me to cover how to access and save data in loose text; as well as the CSV, Parquet and JSON formats, conveniently, in one place.
Using the Spark context, it is possible to load a text file into an RDD using the
textFile method. Also, the
wholeTextFile method can read the contents of a directory into an RDD. The following examples show how a file, based on the local file system (
file://), or HDFS (
hdfs://) can be read into a Spark RDD. These examples show that the data will be partitioned into ...