Implementing Naïve Bayes from scratch

After a hand-calculating spam email detection example, as promised, we are going to code it through a genuine dataset, taken from the Enron email dataset The specific dataset we are using can be directly downloaded via You can either unzip it using software, or run the following command line on your terminal:

tar -xvz enron1.tar.gz

The uncompressed folder includes a folder of ham, or non-spam, email text files, and a folder of spam email text files, as well as a summary description of the database:

enron1/  ham/    0001.1999-12-10.farmer.ham.txt       0002.1999-12-13.farmer.ham.txt       ……       …… 5172.2002-01-11.farmer.ham.txt ...

Get Python Machine Learning By Example - Second Edition now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.