O'Reilly logo

Machine Learning with Spark - Second Edition by Nick Pentreath, Manpreet Singh Ghotra, Rajdeep Dua

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

General transformations

The Spark Streaming API also exposes a general transform function that gives us access to the underlying RDD for each batch in the stream. That is, where the higher-level functions such as map transform a DStream to another DStream, transform allows us to apply functions from an RDD to another RDD. For example, we can use the RDD join operator to join each batch of the stream to an existing RDD that we computed separately from our streaming application (perhaps, in Spark or some other system).

The full list of transformations and further information on each of them is provided in the Spark documentation at http://spark.apache.org/docs/latest/streaming-programming-guide.html#transformations-on-dstreams.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required