17. Exporting data and building full data pipelines
This chapter covers
- Exporting data from Spark
- Building a complete data pipeline, from ingestion to export
- Understanding the impact of partitioning
- Using Delta Lake as a database
- Using Spark with cloud storage
As you are reaching the end of this book, it is time to see how to export data. After all, why did you learn all this if it was just to keep data within Spark, right? I know, I do appreciate learning as a hobby, but it is even better when you can actually bring some business value, right?
This chapter is divided into three sections. The first section covers exporting data. As usual, you will use a real dataset, ingest it, and then export it. You will impersonate a NASA scientist and ...
Get Spark in Action, Second Edition now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.