January 2019
Beginner to intermediate
154 pages
4h 31m
English
Data partitioning plays a really important role in distributed computing, as it defines the degree of parallelism for the applications. Understating and defining partitions in the right way can significantly improve the performance of Spark jobs. There are two ways to control the degree of parallelism for RDD operations:
Read now
Unlock full access