March 2019
Beginner to intermediate
182 pages
4h 6m
English
In the previous chapters, we were focusing on processing and loading data. We learned about transformations, actions, joining, shuffling, and other aspects of Spark.
In this chapter, we will learn how to save data in the correct format and also save data in plain text format using Spark's standard API. We will also leverage JSON as a data format, and learn how to use standard APIs to save JSON. Spark has a CSV format and we will leverage that format as well. We will then learn more advanced schema-based formats, where support is required to import third-party dependencies. Following that, we will use Avro with Spark and learn how to use and save the data in a columnar format known as Parquet. By the end of ...
Read now
Unlock full access