July 2017
Intermediate to advanced
796 pages
18h 55m
English
Spark streaming leverages Spark core's fast scheduling capability to perform streaming analytics by ingesting real-time streaming data from various sources such as HDFS, Kafka, Flume, Twitter, ZeroMQ, Kinesis, and so on. Spark streaming uses micro-batches of data to process the data in chunks and, uses a concept known as DStreams, Spark streaming can operate on the RDDs, applying transformations and actions as regular RDDs in the Spark core API. Spark streaming operations can recover from failure automatically using various techniques. Spark streaming can be combined with other Spark components in a single program, unifying real-time processing with machine learning, SQL, and graph operations.
Read now
Unlock full access