GDELT dataset
In order to validate our implementation, we use the GDELT dataset we analyzed in the previous chapter. We extracted all of the communities and spent some time looking at the person names to see whether or not our community clustering was consistent. The full picture of the communities is reported in Figure 7 and has been realized using the Gephi software, where only the top few thousand connections have been imported:
We first observe that most of the communities we detected are totally aligned with the ones we could eyeball on a force-directed layout, giving a good confidence level about ...
Get Mastering Spark for Data Science now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.