E. Installing Apache Spark

As mentioned in Chapter 8, “Hadoop YARN Applications,” Apache Spark is a fast, in-memory data processing engine. Spark differs from the classic MapReduce model in two ways. First, Spark holds intermediate results in memory, rather than writing them to disk. Second, Spark supports more than just MapReduce functions, greatly expanding the set of possible analyses that can be executed over HDFS data stores. It also provides APIs in Scala, Java, and Python. Spark has been fully integrated to run under YARN.

As of this writing, Apache Spark has not been fully integrated into the Hortonworks HDP Hadoop distribution version 2.2.4. The next release will include Spark as a fully integrated Ambari and HDP component.

As demonstrated ...

Get Hadoop 2 Quick-Start Guide: Learn the Essentials of Big Data Computing in the Apache Hadoop 2 Ecosystem now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.