Chapter 13

Detecting Text Message Spam

Neil McGuigan

University of British Columbia, Sauder School of Business, Canada

Acronyms

CSV -Comma-separated values

SMS -Short Message Service

UTF -Universal Character Set Transformation Format

13.1 Overview

This chapter is about text classification. Text classification is an important topic in data mining, as most communications are stored in text format. We will build a RapidMiner process that learns the difference between spam messages, and messages that you actually want to read. We will then apply the learned model to new messages to decide whether or not they are spam. Spam is a topic familiar to many, so it is a natural medium to work in. The same techniques used to classify spam messages can be ...

Get RapidMiner now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.