2 Data Transformation and Data Manipulation with Apache Spark

Apache Spark is a powerful distributed computing framework that can handle large-scale data processing tasks. One of the most common tasks when working with data is loading it from various sources and writing it into various formats. In this hands-on chapter, you will gain a comprehensive understanding of how to transform and manipulate data using Apache Spark.

In this chapter, we’re going to cover the following main recipes:

Applying basic transformations to data with Apache Spark
Filtering data with Apache Spark
Performing joins with Apache Spark
Performing aggregations with Apache Spark
Using window functions with Apache Spark
Writing custom UDFs in Apache Spark
Handling null ...

Get Data Engineering with Databricks Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Data Engineering with Databricks Cookbook by Pulkit Chadha

2

Data Transformation and Data Manipulation with Apache Spark

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly