Transformations

We've used few transformation functions in the examples in this chapter, but I would like to share with you a list of the most commonly used transformation functions in Apache Spark. You can find a complete list of functions in the official documentation http://bit.ly/RDDTransformations.

Most Common Transformations

 

map(func)

coalesce(numPartitions)

filter(func)

repartition(numPartitions)

flatMap(func)

repartitionAndSortWithinPartitions(partitioner)

mapPartitions(func)

join(otherDataset, [numTasks])

mapPartitionsWithIndex(func)

cogroup(otherDataset, [numTasks])

sample(withReplacement, fraction, seed)

cartesian(otherDataset)

Map(func)

The map transformation is the most commonly used and the simplest of transformations ...

Get Learning Apache Spark 2 now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.