July 2017
Intermediate to advanced
796 pages
18h 55m
English
Spark core is the underlying general execution engine for the Spark platform that all other functionality is built upon. Spark core contains basic Spark functionalities required for running jobs and needed by other components. It provides in-memory computing and referencing datasets in external storage systems, the most important being the Resilient Distributed Dataset (RDD).
In addition, Spark core contains logic for accessing various filesystems, such as HDFS, Amazon S3, HBase, Cassandra, relational databases, and so on. Spark core also provides fundamental functions to support networking, security, scheduling, and data shuffling to build a high scalable, fault-tolerant platform for distributed computing.
Read now
Unlock full access