Text Mining and Analysis

Chapter 6 Clustering and Topic Extraction

Introduction

What Is Clustering?

Singular Value Decomposition and Latent Semantic Indexing

Introduction

In Chapters 1 through 5, you learned how to take a collection of documents and convert them into a vector space model that represents features of each document using numeric values. In this chapter, we discuss how to take that vector space model and assign each document to a small number of groups, called clusters. The basic idea is that documents within a cluster should be similar to each other, and documents in different clusters should be dissimilar to each other. The similarity between two documents is based on the similarity of features (such as terms ...

Get Text Mining and Analysis now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Text Mining and Analysis by Dr. Goutam Chakraborty, Murali Pagolu, Satish Garla

Chapter 6 Clustering and Topic Extraction

Introduction

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly