July 2017
Intermediate to advanced
796 pages
18h 55m
English
Data structure-based transformations are transformation functions which operate on the underlying data structures of the RDD, the partitions in the RDD. In these functions, you can directly work on partitions without directly touching the elements/data inside the RDD. These are essential in any Spark program beyond the simple programs where you need more control of the partitions and distribution of partitions in the cluster. Typically, performance improvements can be realized by redistributing the data partitions according to the cluster state and the size of the data, and the exact use case requirements.
Examples of such transformations are:
The following ...
Read now
Unlock full access