O'Reilly logo

Apache Mesos Essentials by Dharmesh Kakadia

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Spark Streaming

We already know that Spark can be used for processing a large amount of data. Spark Streaming is an extension of Spark API to enable the processing of stream data. It supports a large variety of input data sources, including Twitter, HDFS, Kafka, Flume, Akka Actor, TCP sockets, and ZeroMQ. Spark Streaming breaks up the input data stream in small batches, and this discretized stream is then processed by the Spark program. The processed batches can be routed for further processing or can be stored on HDFS, databases, and so on.

Spark Streaming has a basic abstraction of DStream or discretized streams (http://www.cs.berkeley.edu/~matei/papers/2012/hotcloud_spark_streaming.pdf). Internally, DStreams are represented as a sequence of ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required