Chapter 5. Iterative Computation with Spark

In the previous chapter, we saw how Samza can enable near real-time stream data processing within Hadoop. This is quite a step away from the traditional batch processing model of MapReduce, but still keeps with the model of providing a well-defined interface against which business logic tasks can be implemented. In this chapter we will explore Apache Spark, which can be viewed both as a framework on which applications can be built as well as a processing framework in its own right. Not only are applications being built on Spark, but entire components within the Hadoop ecosystem are also being reimplemented to use Spark as their underlying processing framework. In particular, we will cover the following ...

Get Learning Hadoop 2 now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.