Skip to Main Content
Data Science Using Python and R
book

Data Science Using Python and R

by Chantal D. Larose, Daniel T. Larose
April 2019
Beginner to intermediate content levelBeginner to intermediate
240 pages
6h 47m
English
Wiley
Content preview from Data Science Using Python and R

Chapter 10CLUSTERING

10.1 WHAT IS CLUSTERING?

Clustering refers to the grouping of records, observations, or cases into classes of similar objects. A cluster is a collection of records that are similar to one another and dissimilar to records in other clusters. Clustering differs from classification in that there is no target variable for clustering. The clustering task does not try to classify, estimate, or predict the value of a target variable. Instead, clustering algorithms seek to segment the entire data set into relatively homogeneous subgroups or clusters, where the similarity of the records within the cluster is maximized and the similarity to records outside this cluster is minimized.

For example, the Nielsen PRIZM segments, developed by Claritas, Inc., represent demographic profiles of each geographic area in the United States, in terms of distinct lifestyle types, as defined by zip code. For example, the clusters identified for zip code 90210, Beverly Hills, California, are:

  • Cluster 01: Upper Crust Estates
  • Cluster 03: Movers and Shakers
  • Cluster 04: Young Digerati
  • Cluster 07: Money and Brains
  • Cluster 16: Bohemian Mix

The description for Cluster 01: Upper Crust is “The nation’s most exclusive address, Upper Crust is the wealthiest lifestyle in America, a Haven for empty‐nesting couples between the ages of 45 and 64. No segment has a higher concentration of residents earning over $100,000 a year and possessing a postgraduate degree. And none has a more opulent standard ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Practical Data Science with Python 3: Synthesizing Actionable Insights from Data

Practical Data Science with Python 3: Synthesizing Actionable Insights from Data

Ervin Varga
Python Data Science Essentials - Third Edition

Python Data Science Essentials - Third Edition

Alberto Boschetti, Luca Massaron, Pietro Marinelli, Matteo Malosetti

Publisher Resources

ISBN: 9781119526810Purchase book