Chapter 13
Document Clustering: The Next Frontier
David C. Anastasiu
University of MinnesotaMinneapolis, MNanast021@umn.edu
Andrea Tagarelli
University of CalabriaArcavacata di Rende, Italytagarelli@deis.unical.it
George Karypis
University of MinnesotaMinneapolis, MNkarypis@cs.umn.edu
13.1 Introduction
The proliferation of documents, on both the Web and in private systems, makes knowledge discovery in document collections arduous. Clustering has been long recognized as a useful tool for the task. It groups like-items together, maximizing intra-cluster similarity and inter-cluster distance. Clustering can provide insight into the make-up of a document collection and is often used as the initial step in data analysis.
While most document clustering ...
Get Data Clustering now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.