O'Reilly logo

Learning Real-time Processing with Spark Streaming by Sumit Gupta

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Architecture of Spark

In this section we will discuss the need for the Spark framework in comparison to Hadoop and then we will also talk about the architecture of Spark which is also referred as its Core Spark Framework.

Spark versus Hadoop

Apache Spark is an open source cluster computing framework which seemed to be similar to Apache Hadoop but actually it is superior to Hadoop. Hadoop performed well for the majority of large scale and distributed data processing over commodity boxes but it failed in two scenarios:

  • Iterative and interactive computations and workloads: For example, machine learning algorithms which reuse intermediate or working datasets across multiple parallel operations.
  • Real-time data processing: Hadoop was mainly built for batch ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required