Chapter 5 Data Transformation

Introduction

Zipf’s Law

Term-By-Document Matrix

Text Filter Node

Frequency Weightings

Term Weightings

Filtering Documents

Concept Links

Summary

References

Introduction

You saw in Chapter 4 that the first task of text mining analysis is to break down the text into a bag of words or tokens. Then, you apply various linguistic rules to identify the parts of speech, synonyms, noun groups, attributes, etc. Even before doing this exercise, a good understanding of what is being talked about in a corpus can be obtained by looking at the counts of the words extracted from the corpus. A current popular technique to visually represent prominent terms in text is using a word cloud or text cloud. This is an easy and visually appealing ...

Get Text Mining and Analysis now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.