While some of the operators we have seen in Spark Streaming, such as count, are not actions as in the batch RDD case, Spark Streaming has the concept of actions on DStreams. Actions are output operators that, when invoked, trigger computation on the DStream. They are as follows:
- print: This prints the first 10 elements of each batch to the console and is typically used for debugging and testing.
- saveAsObjectFile, saveAsTextFiles, and saveAsHadoopFiles: These functions output each batch to a Hadoop-compatible filesystem with a filename (if applicable) derived from the batch start timestamp.
- forEachRDD: This operator is the most generic and allows us to apply any arbitrary processing to the RDDs within each batch of a DStream. It is ...