Python with Apache Spark

Apache Spark is a computing framework that works on top of HDFS and provides an alternative way of computing that is similar to MapReduce. It was developed by AmpLab of UC Berkeley. Spark does its computation mostly in the memory because of which, it is much faster than MapReduce, and is well suited for machine learning as it's able to handle iterative workloads really well.

Python with Apache Spark

Spark uses the programming abstraction of RDDs (Resilient Distributed Datasets) in which data is logically distributed into partitions, and transformations can be performed on top of this data.

Python is one of the languages that is used to interact with ...

Get Mastering Python for Data Science now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.