Skip to Main Content
Data Algorithms with Spark
book

Data Algorithms with Spark

by Mahmoud Parsian
April 2022
Intermediate to advanced content levelIntermediate to advanced
435 pages
9h 44m
English
O'Reilly Media, Inc.
Book available
Content preview from Data Algorithms with Spark

Chapter 3. Mapper Transformations

This chapter will introduce the most common Spark mapper transformations through simple working examples. Without a clear understanding of transformations, it is hard to use them in a proper and meaningful way to solve any data problem. We will examine mapper transformations in the context of RDD data abstractions. A mapper is a function that is used to process all the elements of a source RDD and generate a target RDD. For example, a mapper can transform a String record into tuples, (key, value) pairs, or whatever your desired output may be. Informally, we can say that a mapper transforms a source RDD[V] into a target RDD[T], where V and T are the data types of the source and target RDDs, respectively. You may apply mapper transformations to DataFrames as well, by either applying DataFrame functions (using select() and UDFs) to all rows or converting your DataFrame (a table of rows and columns) to an RDD and then using Spark’s mapper transformations.

Data Abstractions and Mappers

Spark has many transformations and actions, but this chapter is dedicated to explaining the ones that are most often used in building Spark applications. Spark’s simple and powerful mapper transformations enable us to perform ETL operations in a simple way.

As I’ve mentioned, the RDD is an important data abstraction in Spark that is suitable for unstructured and semi-structured ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Data Algorithms

Data Algorithms

Mahmoud Parsian
Algorithms and Data Structures for Massive Datasets

Algorithms and Data Structures for Massive Datasets

Dzejla Medjedovic, Emin Tahirovic, Ines Schweigert

Publisher Resources

ISBN: 9781492082378Errata PageSupplemental Content