Chapter 6 Clustering and Topic Extraction

Introduction

What Is Clustering?

Singular Value Decomposition and Latent Semantic Indexing

Topic Extraction

Scoring

Summary

References

Introduction

In Chapters 1 through 5, you learned how to take a collection of documents and convert them into a vector space model that represents features of each document using numeric values. In this chapter, we discuss how to take that vector space model and assign each document to a small number of groups, called clusters. The basic idea is that documents within a cluster should be similar to each other, and documents in different clusters should be dissimilar to each other. The similarity between two documents is based on the similarity of features (such as terms ...

Get Text Mining and Analysis now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.