The previous chapter introduced you to using Docker and Apache Zeppelin to power your Spark explorations. You learned to transform loosely structured data into reliable, self-documenting, and most importantly, highly structured data through the application of explicit schemas. You wrote your first end-to-end ETL job, which enabled you to encode this journey from raw data to structured data in a reliable way. However, the process we looked at is just the beginning and can ...
4. Transforming Data with Spark SQL and the DataFrame API
Get Modern Data Engineering with Apache Spark: A Hands-On Guide for Building Mission-Critical Streaming Applications now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.