Exploring GDELT
A large part of the EDA journey is obtaining and documenting the sources of data, and GDELT content is no exception. After researching the GKG datasets, we discovered that it was challenging just to document the actual sources of data we should be using. In the following sections, we provide a comprehensive listing of the resources we located for use, which will need to be run in the examples.
Note
A cautionary note on download times: using a typical 5 Mb home broadband, 2000 GKG files takes approximately 3.5 hours to download. Given that the GKG English language files alone have over 40,000 files, this could take a while to download.
GDELT GKG datasets
We should be using the latest GDELT data feed, version 2.1 as of December 2016. ...
Get Mastering Spark for Data Science now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.