12 Transforming your data

This chapter covers

  • Learning the data transformation process
  • Performing record-level data transformation
  • Learning data discovery and data mapping
  • Implementing a data transformation process on a real-world dataset
  • Verifying the result of data transformations
  • Joining datasets to get richer data and insights

This chapter is probably the cornerstone of the book. All the knowledge you gathered through the first 11 chapters has brought you to these key questions: “Once I have all this data, how can I transform it, and what can I do with it?”

Apache Spark is all about data transformation, but what precisely is data transformation? How can you perform such transformations in a repeatable and procedural way? Think of it as ...

Get Spark in Action, Second Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.