O'Reilly logo

Hadoop 2 Quick-Start Guide: Learn the Essentials of Big Data Computing in the Apache Hadoop 2 Ecosystem by Douglas Eadline

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

E. Installing Apache Spark

As mentioned in Chapter 8, “Hadoop YARN Applications,” Apache Spark is a fast, in-memory data processing engine. Spark differs from the classic MapReduce model in two ways. First, Spark holds intermediate results in memory, rather than writing them to disk. Second, Spark supports more than just MapReduce functions, greatly expanding the set of possible analyses that can be executed over HDFS data stores. It also provides APIs in Scala, Java, and Python. Spark has been fully integrated to run under YARN.

As of this writing, Apache Spark has not been fully integrated into the Hortonworks HDP Hadoop distribution version 2.2.4. The next release will include Spark as a fully integrated Ambari and HDP component.

As demonstrated ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required