13. Transforming entire documents

This chapter covers

  • Transforming entire documents for better analytics or condensing
  • Navigating the catalog of static functions
  • Using static functions for data transformation

This chapter focuses on the transformation of entire documents: Spark will ingest a complete document, transform it, and make it available in another format.

In the previous chapter, you read about data transformations. The next logical step is to transform entire documents and their structure. As an example, JSON is great for transporting data, but a real pain when you have to traverse it to do analytics. In a similar way, joined datasets have so much data redundancy that it is painful to have a synthetic view. Apache Spark can help ...

Get Spark in Action, Second Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.