O'Reilly logo

Machine Learning for OpenCV by Michael Beyeler

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Loading the dataset

If you downloaded the latest code from GitHub, you will find a number of .zip files in the notebooks/data/chapter7 directory. These files contain raw email data (with fields for To:, Cc:, and text body) that are either classified as spam (with the SPAM = 1 class label) or not (also known as ham, the HAM = 0 class label).

We build a variable called sources, which contains all the raw data files:

In [1]: HAM = 0...     SPAM = 1...     datadir = 'data/chapter7'...     sources = [...        ('beck-s.tar.gz', HAM),...        ('farmer-d.tar.gz', HAM),...        ('kaminski-v.tar.gz', HAM),...        ('kitchen-l.tar.gz', HAM),...        ('lokay-m.tar.gz', HAM),...        ('williams-w3.tar.gz', HAM),...        ('BG.tar.gz', SPAM),...        ('GP.tar.gz', SPAM),...        ('SH.tar.gz', SPAM)...     ]

The ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required