March 2019
Beginner to intermediate
182 pages
4h 6m
English
In this section, we will explore the reason why we use aggregateByKey instead of groupBy.
We will cover the following topics:
First, we will create our array of user transactions, as shown in the following example:
val keysWithValuesList = Array( UserTransaction("A", 100), UserTransaction("B", 4), UserTransaction("A", 100001), UserTransaction("B", 10), UserTransaction("C", 10) )
We will then use parallelize to create an RDD, as we want our data to be key-wise. This is shown in the following example:
val data = spark.parallelize(keysWithValuesList) val keyed = data.keyBy(_.userId)
Read now
Unlock full access