Chaining a new RDD with the parent

We first created a multiple RDD class. In the MultipliedRDD class, we have two things that pass the parameters:

  • A brief RDD of the record, that is, RDD[Record]
  • A multiplier, that is, Double

In our case, there could be a chain of multiple RDD's, which means that there could be multiple RDD's inside our RDD. So, this is not always the parent of all the directed acyclic graphs. We are just extending the RDD of the type record and so we need to pass the RDD that is extended.

RDD has a lot of methods and we can override any method we want. However, this time, we are going with the compute method, where we will override the compute method to calculate the multiplier. Here, we get a Partition split and TaskContext ...

Get Hands-On Big Data Analytics with PySpark now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.