10 Clustering data into groups

This section covers

  • Clustering data by centrality
  • Clustering data by density
  • Trade-offs between clustering algorithms
  • Executing clustering using the scikit-learn library
  • Iterating over clusters using Pandas

Clustering is the process of organizing data points into conceptually meaningful groups. What makes a given group “conceptually meaningful”? There is no easy answer to that question. The usefulness of any clustered output is dependent on the task we’ve been assigned.

Imagine that we’re asked to cluster a collection of pet photos. Do we cluster fish and lizards in one group and fluffy pets (such as hamsters, cats, and dogs) in another? Or should hamsters, cats, and dogs be assigned three separate clusters of ...

Get Data Science Bookcamp now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.