17.7 Spark Streaming: Counting Twitter Hashtags Using the pyspark-notebook
Docker Stack
In this section, you’ll create and run a Spark streaming application in which you’ll receive a stream of tweets on the topic(s) you specify and summarize the top-20 hashtags in a bar chart that updates every 10 seconds. For this purpose of this example, you’ll use the Jupyter Docker container from the first Spark example.
There are two parts to this example. First, using the techniques from the “Data Mining Twitter” chapter, you’ll create a script that streams tweets from Twitter. Then, we’ll use Spark streaming in a Jupyter Notebook to read the tweets and summarize the hashtags.
The two parts will communicate with one another via networking sockets—a low-level ...
Get Intro to Python for Computer Science and Data Science: Learning to Program with AI, Big Data and The Cloud now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.