Transformations
We've used few transformation functions in the examples in this chapter, but I would like to share with you a list of the most commonly used transformation functions in Apache Spark. You can find a complete list of functions in the official documentation http://bit.ly/RDDTransformations.
Most Common Transformations | |
|
coalesce(numPartitions) |
|
repartition(numPartitions) |
|
repartitionAndSortWithinPartitions(partitioner) |
|
join(otherDataset, [numTasks]) |
|
cogroup(otherDataset, [numTasks]) |
|
cartesian(otherDataset) |
Map(func)
The map
transformation is the most commonly used and the simplest of transformations ...
Get Learning Apache Spark 2 now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.