O'Reilly logo

Text Mining and Analysis by Satish Garla, Murali Pagolu, Goutam Chakraborty

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Chapter 6 Clustering and Topic Extraction

Introduction

What Is Clustering?

Singular Value Decomposition and Latent Semantic Indexing

Topic Extraction

Scoring

Summary

References

Introduction

In Chapters 1 through 5, you learned how to take a collection of documents and convert them into a vector space model that represents features of each document using numeric values. In this chapter, we discuss how to take that vector space model and assign each document to a small number of groups, called clusters. The basic idea is that documents within a cluster should be similar to each other, and documents in different clusters should be dissimilar to each other. The similarity between two documents is based on the similarity of features (such as terms ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required