O'Reilly logo

Text Mining and Analysis by Satish Garla, Murali Pagolu, Goutam Chakraborty

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Chapter 5 Data Transformation

Introduction

Zipf’s Law

Term-By-Document Matrix

Text Filter Node

Frequency Weightings

Term Weightings

Filtering Documents

Concept Links

Summary

References

Introduction

You saw in Chapter 4 that the first task of text mining analysis is to break down the text into a bag of words or tokens. Then, you apply various linguistic rules to identify the parts of speech, synonyms, noun groups, attributes, etc. Even before doing this exercise, a good understanding of what is being talked about in a corpus can be obtained by looking at the counts of the words extracted from the corpus. A current popular technique to visually represent prominent terms in text is using a word cloud or text cloud. This is an easy and visually appealing ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required