S. HainesModern Data Engineering with Apache Sparkhttps://doi.org/10.1007/978-1-4842-7452-1_4

4. Transforming Data with Spark SQL and the DataFrame API

Scott Haines¹

(1)

San Jose, CA, USA

The previous chapter introduced you to using Docker and Apache Zeppelin to power your Spark explorations. You learned to transform loosely structured data into reliable, self-documenting, and most importantly, highly structured data through the application of explicit schemas. You wrote your first end-to-end ETL job, which enabled you to encode this journey from raw data to structured data in a reliable way. However, the process we looked at is just the beginning and can ...

Get Modern Data Engineering with Apache Spark: A Hands-On Guide for Building Mission-Critical Streaming Applications now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Modern Data Engineering with Apache Spark: A Hands-On Guide for Building Mission-Critical Streaming Applications by Scott Haines

4. Transforming Data with Spark SQL and the DataFrame API

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly