E. Installing Apache Spark
As mentioned in Chapter 8, “Hadoop YARN Applications,” Apache Spark is a fast, in-memory data processing engine. Spark differs from the classic MapReduce model in two ways. First, Spark holds intermediate results in memory, rather than writing them to disk. Second, Spark supports more than just MapReduce functions, greatly expanding the set of possible analyses that can be executed over HDFS data stores. It also provides APIs in Scala, Java, and Python. Spark has been fully integrated to run under YARN.
As of this writing, Apache Spark has not been fully integrated into the Hortonworks HDP Hadoop distribution version 2.2.4. The next release will include Spark as a fully integrated Ambari and HDP component.
As demonstrated ...