December 2018
Intermediate to advanced
318 pages
8h 28m
English
We will use the Kaggle dataset in the following example. The data is similar to the data gathered in a mail server. An intelligent way to gather spam email is to collect data from mail servers that have been shut down. Since the email accounts associated with such mail servers perpetually do not exist, it can be assumed that any emails sent to these email accounts are spam emails.
The following screenshot shows a snippet of actual Kaggle data, taken from https://www.kaggle.com/uciml/sms-spam-collection-dataset:

We have modified the data to add labels (0 is ham and 1 is spam), as follows:
|
Spam/Ham |
|
Read now
Unlock full access