O'Reilly logo

Machine Learning with Spark - Second Edition by Nick Pentreath, Manpreet Singh Ghotra, Rajdeep Dua

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Actions

While some of the operators we have seen in Spark Streaming, such as count, are not actions as in the batch RDD case, Spark Streaming has the concept of actions on DStreams. Actions are output operators that, when invoked, trigger computation on the DStream. They are as follows:

  • print: This prints the first 10 elements of each batch to the console and is typically used for debugging and testing.
  • saveAsObjectFile, saveAsTextFiles, and saveAsHadoopFiles: These functions output each batch to a Hadoop-compatible filesystem with a filename (if applicable) derived from the batch start timestamp.
  • forEachRDD: This operator is the most generic and allows us to apply any arbitrary processing to the RDDs within each batch of a DStream. It is ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required