O'Reilly logo

Learning Real-time Processing with Spark Streaming by Sumit Gupta

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Chapter 2. Architecture and Components of Spark and Spark Streaming

Apache Hadoop brought a revolution in data processing and storage space when it enabled fault-tolerant and distributed processing of large data (TBs/PBs) over commodity machines. Developed on the MapReduce programing model (http://en.wikipedia.org/wiki/MapReduce), Hadoop provided a low cost solution and reliable batch processing (http://en.wikipedia.org/wiki/Batch_processing) of large data.

Hadoop was a perfect fit for most of the varied and complex use cases but there was still a large set of use cases like real-time data processing and computations, iterative data processing (machine learning), and graph processing which were not possible with Hadoop and were still a distant ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required