Description of the SpamHam dataset

Before we present the actual dataset, here are a few real-world spam samples: 

Spam email with phishing example

Here is an example of regular or wanted mail, also known as ham:

A perfectly normal email from Lightbend

The following is a glimpse into the actual dataset used in our spam-ham classification task. There are two datasets:

  • inbox.txt: A ham dataset compiled from a small collection of regular emails from my Inbox folder
  • junk.txt: A spam dataset compiled from a small collection of junk email from my ...

Get Modern Scala Projects now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.