© The Author(s), under exclusive license to APress Media, LLC, part of Springer Nature 2022
S. HainesModern Data Engineering with Apache Sparkhttps://doi.org/10.1007/978-1-4842-7452-1_2

2. Getting Started with Apache Spark

Scott Haines1  
(1)
San Jose, CA, USA
 

Apache Spark is the data engineer’s Swiss Army knife. As a unified framework, it provides essential libraries to effectively connect and establish a common data narrative for engineers to work together cross-discipline. From ingestion and validation of raw data to data cleansing, transformation, and aggregation, as well as analytical exploration of trends and generation of insights, Spark connects the dots between the various constituents in any successful data operation. It also supports consistent ...

Get Modern Data Engineering with Apache Spark: A Hands-On Guide for Building Mission-Critical Streaming Applications now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.