Data clustering

A clustering problem consists in the selection and grouping of homogeneous items from a set of initial data. To solve this problem, we must:

  • Identify a resemblance measure between elements
  • Find out if there are subsets of elements that are similar to the measure chosen

The algorithm determines which elements form a cluster and what degree of similarity unites them within the cluster.

The clustering algorithms fall into the unsupervised methods, because we do not assume any prior information on the structures and characteristics of the clusters.

The k-means algorithm

One of the most common and simple clustering algorithms is k-means, which allows subdividing groups of objects into k partitions on the basis of their attributes. Each cluster ...

Get Getting Started with TensorFlow now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.