© The Author(s), under exclusive license to APress Media, LLC, part of Springer Nature 2022
S. HainesModern Data Engineering with Apache Sparkhttps://doi.org/10.1007/978-1-4842-7452-1_7

7. Data Pipelines and Structured Spark Applications

Scott Haines1  
(1)
San Jose, CA, USA
 

There is a central processing paradigm that exists behind the scenes and can help connect just about everything you build as a data engineer. The processing paradigm is a physical as well as a mental model for effectively moving and processing data, known as the data pipeline . We first touched on the data pipeline in Chapter 1, while introducing the history and common components driving the modern data stack. This chapter will teach you how to write, test, and compile reliable ...

Get Modern Data Engineering with Apache Spark: A Hands-On Guide for Building Mission-Critical Streaming Applications now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.