© The Author(s), under exclusive license to APress Media, LLC, part of Springer Nature 2022
S. HainesModern Data Engineering with Apache Sparkhttps://doi.org/10.1007/978-1-4842-7452-1_4

4. Transforming Data with Spark SQL and the DataFrame API

Scott Haines1  
(1)
San Jose, CA, USA
 

The previous chapter introduced you to using Docker and Apache Zeppelin to power your Spark explorations. You learned to transform loosely structured data into reliable, self-documenting, and most importantly, highly structured data through the application of explicit schemas. You wrote your first end-to-end ETL job, which enabled you to encode this journey from raw data to structured data in a reliable way. However, the process we looked at is just the beginning and can ...

Get Modern Data Engineering with Apache Spark: A Hands-On Guide for Building Mission-Critical Streaming Applications now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.