Skip to Main Content
Hands-On Deep Learning with Apache Spark
book

Hands-On Deep Learning with Apache Spark

by Guglielmo Iozzia
January 2019
Intermediate to advanced content levelIntermediate to advanced
322 pages
7h 29m
English
Packt Publishing
Content preview from Hands-On Deep Learning with Apache Spark

RDD programming

In general, every Spark application is a driver program that runs the logic that has been implemented for it and executes parallel operations on a cluster. In accordance with the previous definition, the main abstraction provided by the core Spark framework is the RDD. It is an immutable distributed collection of data that is partitioned across machines in a cluster. Operations on RDDs can happen in parallel.

Two types of operations are available on an RDD:

  • Transformations
  • Actions

A transformation is an operation on an RDD that produces another RDD, while an action is an operation that triggers some computation and then returns a value to the master or can be persisted to a storage system. Transformations are lazy—they aren't ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Next-Generation Machine Learning with Spark: Covers XGBoost, LightGBM, Spark NLP, Distributed Deep Learning with Keras, and More

Next-Generation Machine Learning with Spark: Covers XGBoost, LightGBM, Spark NLP, Distributed Deep Learning with Keras, and More

Butch Quinto
Apache Spark Deep Learning Cookbook

Apache Spark Deep Learning Cookbook

Ahmed Sherif, Amrith Ravindra, Michal Malohlava, Adnan Masood

Publisher Resources

ISBN: 9781788994613Supplemental Content