There is a central processing paradigm that exists behind the scenes and can help connect just about everything you build as a data engineer. The processing paradigm is a physical as well as a mental model for effectively moving and processing data, known as the data pipeline . We first touched on the data pipeline in Chapter 1, while introducing the history and common components driving the modern data stack. This chapter will teach you how to write, test, and compile reliable ...
7. Data Pipelines and Structured Spark Applications
Get Modern Data Engineering with Apache Spark: A Hands-On Guide for Building Mission-Critical Streaming Applications now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.