Operations in Spark

RDDs support two types of operations:

Transformations
Actions

Transformations

The transformation operation performs some functions and creates another dataset. Transformations are processed in the lazy mode and only those transformations that are needed in the end result are processed. If any transformation is found unnecessary, then Spark ignores it, and this improves the efficiency.

Transformations, which are available and mentioned in Spark Apache docs at https://spark.apache.org/docs/latest/programming-guide.html#transformations, are as follows:

Transformation	Meaning
`map (func)`	Return a new distributed dataset formed by passing each element of the source through a function `func`.
`filter (func)`	Return a new dataset formed ...

Get Hadoop Essentials now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Hadoop Essentials by Swizec Teller

Operations in Spark

Transformations

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly