March 2019
Beginner to intermediate
182 pages
4h 6m
English
In this section, we will use keyBy() operations to reduce shuffle. We will cover the following topics:
We will load randomly partitioned data, but this time using the RDD API. We will repartition the data in a meaningful way and extract the information that is going on underneath, similar to DataFrame and the Dataset API. We will learn how to leverage the keyBy() function to give our data some structure and to cause the pre-partitioning in the RDD API.
Here is the test we will be using in this section. We are creating two random input records. The first record has a random user ...
Read now
Unlock full access