Chapter 15

Clustering

IN THIS CHAPTER

Bullet Exploring the potentialities of unsupervised clustering

Bullet Making K-means work with small and big data

Bullet Trying DBScan as an alternative option

One of the basic abilities that humans have exercised since primitive times is to divide the known world into separate classes where individual objects share common features deemed important by the classifier. Starting with primitive cave dwellers classifying the natural world they lived in, distinguishing plants and animals useful or dangerous for their survival, we arrive at modern times in which marketing departments classify consumers into target segments and then act with proper marketing plans.

Classifying is crucial to our process of building new knowledge because, by gathering similar objects, we can:

  • Mention all the items in a class by the same denomination
  • Summarize relevant features by an exemplificative class type
  • Associate particular actions or recall specific knowledge automatically

Dealing with big data streams today requires the same classificatory ability, but on a different scale. To spot unknown groups of signals present in the data, we need specialized algorithms that are both able ...

Get Python for Data Science For Dummies, 2nd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.