Transformations

Spark transformations are basically the operations that take an RDD as an input and produce one or more RDD as output. All transformations are lazy in nature, while the logical execution plans in the form of direction acyclic graph /DAGs are built actual execution happens only when an action is called.

The transformations can be qualified as narrow transformations and wide transformations.

Narrow transformations

Wide transformations

Narrow transformations are where data from a single partition in child RDD is computed using data from a single partition of parent RDD. The examples are map(), filter().

Wide transformations are where records in a single partition in child RDD can be computed using data across parent ...

Get Practical Real-time Data Processing and Analytics now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.