Text Mining and Analysis

Chapter 5 Data Transformation

Introduction

Zipf’s Law

Term-By-Document Matrix

Introduction

You saw in Chapter 4 that the first task of text mining analysis is to break down the text into a bag of words or tokens. Then, you apply various linguistic rules to identify the parts of speech, synonyms, noun groups, attributes, etc. Even before doing this exercise, a good understanding of what is being talked about in a corpus can be obtained by looking at the counts of the words extracted from the corpus. A current popular technique to visually represent prominent terms in text is using a word cloud or text cloud. This is an easy and visually appealing ...

Get Text Mining and Analysis now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Text Mining and Analysis by Dr. Goutam Chakraborty, Murali Pagolu, Satish Garla

Chapter 5 Data Transformation

Introduction

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly