Part 1 – Acquiring the data with Spark Structured Streaming

To acquire the data, we use Tweepy which provides an elegant Python client library to access the Twitter APIs. The APIs covered by Tweepy are very extensive and covering them in detail is beyond the scope of this book, but you can find the complete API reference at the Tweepy official website: http://tweepy.readthedocs.io/en/v3.6.0/cursor_tutorial.html.

You can install the Tweepy library directly from PyPi using the pip install command. The following command shows how to install it from a Notebook using the ! directive:

!pip install tweepy

Note

Note: The current Tweepy version used is 3.6.0. Do not forget to restart the kernel after installing the library.

Architecture diagram for the data ...

Get Data Analysis with Python now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.