Spark Streaming

When studying calculus, one thing that remains clear is that life is not a discreet process, it is continuous; and life does not come in small packages, it is a continuously flowing stream.

As discussed in the first chapter, the fresher the information, the greater the benefit of the data. Many modern applications of machine-learning should be calculated in real-time.

Spark Streaming is the module for managing data flows. Much of Spark is built with the concept of RDD. Spark Streaming provides the concept of DStreams, or Discretized Streams. A DStream is a sequence of information related to time. It is very important to emphasize that an internal DStream is a sequence of RDD, hence the name discretized.

Just as RDDs have two transformations, ...

Get Fast Data Processing Systems with SMACK Stack now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.