July 2017
Intermediate to advanced
796 pages
18h 55m
English
HashPartitioner is the default partitioner in Spark and works by calculating a hash value for each key of the RDD elements. All the elements with the same hashcode end up in the same partition as shown in the following code snippet:
partitionIndex = hashcode(key) % numPartitions
The following is an example of the String hashCode() function and how we can generate partitionIndex:
scala> val str = "hello"str: String = helloscala> str.hashCoderes206: Int = 99162322scala> val numPartitions = 8numPartitions: Int = 8scala> val partitionIndex = str.hashCode % numPartitionspartitionIndex: Int = 2
Read now
Unlock full access