October 2018
Intermediate to advanced
252 pages
6h 49m
English
Spam detection is a common classification problem. In the following recipe, we have the corpus of raw text or documents, including labels of those documents marked spam or no spam. The data source here is the SMS Spam Collection v.1, which is a public set of SMS labeled messages that have been collected for mobile phone spam research.
| Application | File format | # Spam | # Ham | Total | Link |
|---|---|---|---|---|---|
| General | Plain text | 747 | 4,827 | 5,574 |
http://www.dt.fee.unicamp.br/~tiago/smsspamcollection/smsspamcollection.zip ... |