Chapter 7

Text Classification and Categorization



Text classification is the process of assigning text documents into two or more categories. The most common form is binary classification, or assigning one of two categories to all documents in the corpus. Text classification is often the first step in the selection of a set of documents to submit to further processing, or it can be the only step in text processing (e.g., spam filtering). The goal in text classification is not to extract ...

Get Practical Text Mining and Statistical Analysis for Non-structured Text Data Applications now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.