General transformations

The Spark Streaming API also exposes a general transform function that gives us access to the underlying RDD for each batch in the stream. That is, where the higher-level functions such as map transform a DStream to another DStream, transform allows us to apply functions from an RDD to another RDD. For example, we can use the RDD join operator to join each batch of the stream to an existing RDD that we computed separately from our streaming application (perhaps, in Spark or some other system).

The full list of transformations and further information on each of them is provided in the Spark documentation at http://spark.apache.org/docs/latest/streaming-programming-guide.html#transformations-on-dstreams.

Get Machine Learning with Spark - Second Edition now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.