March 2019
Beginner to intermediate
182 pages
4h 6m
English
In this section, we'll be looking at the second schema-based format, Parquet. The following topics will be covered:
This is a columnar format, as the data is stored column-wise and not row-wise, as we saw in the JSON, CSV, plain text, and Avro files.
This is a very interesting and important format for big data processing and for making the process faster. In this section, we will focus on adding Parquet support to Spark, saving the data into the filesystem, reloading it again, and then testing. Parquet is similar to Avro as it gives you a parquet method but this time, it is a slightly different implementation.
In the build.sbt file, for the Avro format, ...
Read now
Unlock full access