O'Reilly logo

Apache Mahout Essentials by Jayani Withanawasam

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Text clustering

Text clustering is a widely used application of clustering that is used in areas such as records, management systems, searches, and business intelligence.

The vector space model and TF-IDF

In text clustering, the terms of the documents are considered as features in text clustering. The vector space model is an algebraic model that maps the terms in a document into n-dimensional linear space.

However, we need to represent textual information (terms) as a numerical representation and create feature vectors using the numerical values to evaluate the similarity between data points.

Each dimension of the feature vector represents a separate term. If a particular term is present in the document, then the vector value is set using the

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required