Basics of RDD operation

Let's now go through some RDD operational basics. The best way to understand what something does is to look at the documentation so that we can get a rigorous understanding of what a function performs.

The reason why this is very important is that the documentation is the golden source of how a function is defined and what it is designed to be used as. By reading the documentation, we make sure that we are as close to the source as possible in our understanding. The link to the relevant documentation is https://spark.apache.org/docs/latest/rdd-programming-guide.html.

So, let's start with the map function. The map function returns an RDD by applying the f function to each element of this RDD. In other words, it works ...

Get Hands-On Big Data Analytics with PySpark now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.