Chapter 15
Clustering
IN THIS CHAPTER
Exploring the potentialities of unsupervised clustering
Making K-means work with small and big data
Trying DBScan as an alternative option
One of the basic abilities that humans have exercised since primitive times is to divide the known world into separate classes, with individual objects sharing common features deemed important by the classifier. Starting with primitive cave dwellers classifying the natural world they lived in, distinguishing plants and animals useful or dangerous for their survival, in modern times, marketing departments classify consumers into target segments and then act with proper marketing plans.
Dealing with big data streams today requires the same classificatory ability of our ancestors, but on a different scale. To leverage the information in data requires specialized algorithms capable of performing two tasks: learning to assign examples to predefined classes (the supervised approach) and identifying new and interesting classes that we weren’t aware of (unsupervised learning).
A data-driven approach to classification based on unsupervised learning, called clustering, is presented in the first part of this chapter, ...
Get Python for Data Science For Dummies, 3rd Edition now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.