Chapter 11
Text Classification
Charu C. Aggarwal
IBM T. J. Watson Research CenterYorktown Heights, NY charu@us.ibm.com
ChengXiang Zhai
University of Illinois at Urbana-ChampaignUrbana, IL czhai@cs.uiuc.edu
11.1 Introduction
The problem of classification has been widely studied in the database, data mining, and information retrieval communities. The problem of classification is defined as follows. Given a set of records D = {X1,…,XN} and a set of k different discrete values indexed by {1…k}, each representing a category, the task is to assign one category (equivalently the corresponding index value) to each record Xi. The problem is usually solved by using a supervised learning approach where a set of training data records (i.e., records with ...
Get Data Classification now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.