June 2020
Intermediate to advanced
576 pages
15h 41m
English
This chapter covers
Ingestion is the first step of your big data pipeline. You will have to onboard the data in your instance of Spark, whether it is in local mode or cluster mode. As you know by now, data in Spark is transient, meaning that when you shut down Spark, it’s all gone. You will learn how to import data from standard files including CSV, JSON, XML, and text.
In this chapter, after learning about common behaviors among various parsers, you’ll use made-up datasets to illustrate specific cases, as well as ...
Read now
Unlock full access