August 2019
Intermediate to advanced
560 pages
13h 41m
English
Apache Spark is one of the most popular big data tools. It is a second-generation computing engine that works with Hadoop as an alternative to MapReduce. It provides in-memory computing capabilities to achieve high-performance analytics. The major components in Spark include Spark SQL, Spark Streaming, SparkR, Machine Learning Library (MLlib), and GraphX. Spark is built on the Scala programming language and also supports APIs for Java, Python, and R. The following diagram depicts the ecosystem of Spark:
Spark provides a hybrid processing framework, which means it supports both batch processing and stream processing. ...
Read now
Unlock full access