Sourcing shared counts and content

Before we can begin exploring which features make content shareable, we need to get our hands on a fair amount of content, as well as data on how often it's shared. Unfortunately, securing this type of data has gotten more difficult in the last few years. In fact, when the first edition of this book came out in 2016, this data was easily obtainable. But today, there appears to be no free sources of this type of data, though if you are willing to pay, you can still find it.

Fortunately for us, I have a dataset that was collected from a now defunct website, ruzzit.com. This site, when it was active, tracked the most shared content over time, which is exactly what we require for this project:

We'll begin by ...

Get Python Machine Learning Blueprints - Second Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.