Spark has a more limited Python API than Java and Scala, but it supports most of the core functionality.
The hallmark of a
MapReduce system lies in two commands:
reduce. You've seen the
map function used in the earlier chapters. The
map function works by taking in a function that works on each individual element in the input RDD and produces a new output element. For example, to produce a new RDD where you have added one to every number, you would use
rdd.map(lambda x: x+1). It's important to understand that the
map function and the other Spark functions do not transform the existing elements; instead, they return a new RDD with new elements. The
reduce function takes a function that operates in pairs to ...