Chapter 7

Text Classification and Categorization

Contents

Preamble

Text classification is the process of assigning text documents into two or more categories. The most common form is binary classification, or assigning one of two categories to all documents in the corpus. Text classification is often the first step in the selection of a set of documents to submit to further processing, or it can be the only step in text processing (e.g., spam filtering). The goal in text classification is not to extract ...

Get Practical Text Mining and Statistical Analysis for Non-structured Text Data Applications now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.