March 2019
Beginner to intermediate
182 pages
4h 6m
English
We will be answering the following three main questions in this section:
You can check the documentation at https://spark.apache.org/docs/latest/api/python/pyspark.html?highlight=map#pyspark.RDD.map.
The map function takes two arguments, one of which is optional. The first argument to map is f, which is a function that gets applied to the RDD throughout by the map function. The second argument, or parameter, is the preservesPartitioning parameter, which is False by default.
If we look at the documentation, it says that map simply returns a new RDD by applying a function to each element of this RDD, and obviously, this function refers ...
Read now
Unlock full access