Let's create a simple word count example using Spark Streaming in Python. For this example, we will be working with DStream – the Discretized Stream of small batches that make up the stream of data. The example used for this section of the book can be found in its entirety at: https://github.com/drabastomek/learningPySpark/blob/master/Chapter10/streaming_word_count.py.
This word count example will use the Linux / Unix
nc command – it is a simple utility that reads and writes data across network connections. We will use two different bash terminals, one using the
nc command to send words to our computer's local port (
9999) and one terminal that will run Spark Streaming to receive those words and count ...