Chapter 15

Clustering

IN THIS CHAPTER

Exploring the potentialities of unsupervised clustering

Making K-means work with small and big data

Trying DBScan as an alternative option

One of the basic abilities that humans have exercised since primitive times is to divide the known world into separate classes, with individual objects sharing common features deemed important by the classifier. Starting with primitive cave dwellers classifying the natural world they lived in, distinguishing plants and animals useful or dangerous for their survival, in modern times, marketing departments classify consumers into target segments and then act with proper marketing plans.

Dealing with big data streams today requires the same classificatory ability of our ancestors, but on a different scale. To leverage the information in data requires specialized algorithms capable of performing two tasks: learning to assign examples to predefined classes (the supervised approach) and identifying new and interesting classes that we weren’t aware of (unsupervised learning).

A data-driven approach to classification based on unsupervised learning, called clustering, is presented in the first part of this chapter, ...

Get Python for Data Science For Dummies, 3rd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Python for Data Science For Dummies, 3rd Edition by John Paul Mueller, Luca Massaron

Clustering

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly