Chapter 13
Detecting Text Message Spam
Neil McGuigan
University of British Columbia, Sauder School of Business, Canada
Acronyms
CSV -Comma-separated values
SMS -Short Message Service
UTF -Universal Character Set Transformation Format
13.1 Overview
This chapter is about text classification. Text classification is an important topic in data mining, as most communications are stored in text format. We will build a RapidMiner process that learns the difference between spam messages, and messages that you actually want to read. We will then apply the learned model to new messages to decide whether or not they are spam. Spam is a topic familiar to many, so it is a natural medium to work in. The same techniques used to classify spam messages can be ...
Get RapidMiner now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.