Chapter 3

Conceptual Foundations of Text Mining and Preprocessing Steps

Contents

Preamble

After you determine what you want to do with text mining, you must compose the set of steps that you can follow to perform it. In the next chapter, you will learn about one of the most important innovations in text processing: the generalized vector-space model. Documents are represented by a string of values (a vector), in which each element in the string represents a unique word or word group contained among the list documents. Each element of the vector contains either a 1 or a 0 (for the ...

Get Practical Text Mining and Statistical Analysis for Non-structured Text Data Applications now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.