Clustering text

Clustering is the process of finding groups of objects that are similar to each other. The goal is that objects within a cluster should be more similar to each other than to objects in other clusters. Like classification, it is not a specific algorithm so much as a general class of algorithms that solve a general problem.

Although there are a variety of clustering algorithms, all rely to some extent on a distance measure. For an algorithm to determine whether two objects belong in the same or different clusters it must be able to determine a quantitative measure of the distance (or, if you prefer, the similarity) between them. This calls for a numeric measure of distance: the smaller the distance, the greater the similarity between ...

Get Clojure for Data Science now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.