O'Reilly logo

Hadoop Application Architectures by Gwen Shapira, Jonathan Seidman, Ted Malaska, Mark Grover

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Chapter 7. Near-Real-Time Processing with Hadoop

Through much of its development, Hadoop has been thought of as a batch processing system. The most common processing pattern has been loading data into Hadoop, followed by processing of that data through a batch job, typically implemented in MapReduce. A very common example of this type of processing is the extract-transform-load (ETL) processing pipeline—loading data into Hadoop, transforming it in some way to meet business requirements, and then moving it into a system for further analysis (for example, a relational database).

Hadoop’s ability to efficiently process large volumes of data in parallel provides great benefits, but there are also a number of use cases that require more “real time” (or more precisely, near-real-time, as we’ll discuss shortly) processing of data—processing the data as it arrives, rather than through batch processing. This type of processing is commonly referred to as stream processing.

Fortunately, this need for more real-time processing is being addressed with the integration of new tools into the Hadoop ecosystem. These stream processing tools include systems like Apache Storm, Apache Spark Streaming, Apache Samza, or even Apache Flume via Flume interceptors. In contrast to batch processing systems such as MapReduce, these tools allow you to build processing flows that continuously process incoming data; these flows will process data as long as they remain running, as opposed to a batch process that ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required