Chapter 8. Cluster Analysis

A clustering algorithm is one that identifies groups of data points according to their proximity to each other. These algorithms are similar to classification algorithms in that they also partition a dataset into subsets of similar points. But, in classification, we already have data whose classes have been identified. such as sweet fruit. In clustering, we seek to discover the unknown groups themselves.

Measuring distances

A metric on a set S of points is a function Measuring distances that satisfies these conditions for all x,y,z S:

  1. d(p,q) = 0 p=q
  2. d(p,q) = d(p,q)
  3. d(p,q) ≤ d(p,r)+d(r,q)

Normally, we think of the number d(p,q) as the distance ...

Get Java Data Analysis now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.