February 2019
Beginner to intermediate
382 pages
10h 1m
English
This section provides a quick introduction to programming with Python in Spark. We will start with the basic data structures in Spark.
Resilient Distributed Datasets (RDD) is the primary data structure in Spark. It is a distributed collection of objects and has the following three main features:
RDD was the main data structure in Spark before version 2.0. After that, it is replaced by the DataFrame , which is also ...