Creating a replicable dataset from Twitter

In data mining, there are lots of variables. These aren't the parameters of the data mining algorithms - they are the methods of data collection, how the environment is set up, and many other factors. Being able to replicate your results is important as it enables you to verify or improve upon your results.

Getting 80 percent accuracy on one dataset with algorithm X, and 90 percent accuracy on another dataset with algorithm Y doesn't mean that Y is better. We need to be able to test on the same dataset in the same conditions to be able to properly compare. With running the preceding code, you will get a different dataset to the one I created and used. The main reasons are that Twitter will return ...

Get Learning Data Mining with Python - Second Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.