The naive Bayes implementations

After a hand-calculating spam email detection example, as promised, we are going to code it through a genuine dataset, taken from the Enron email dataset http://www.aueb.gr/users/ion/data/enron-spam/. The specific dataset we are using can be directly downloaded via http://www.aueb.gr/users/ion/data/enron-spam/preprocessed/enron1.tar.gz. You can either unzip it using a software or run the command line tar -xvz enron1.tar.gz in the Terminal. The uncompressed folder includes a folder of ham email text files and a folder of spam email text files, as well as a summary description of the database:

    enron1/  ham/    0001.1999-12-10.farmer.ham.txt    0002.1999-12-13.farmer.ham.txt    ......    ...... 5172.2002-01-11.farmer.ham.txt ...

Get Python Machine Learning By Example now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.