O'Reilly logo

R for Data Science by Dan Toomey

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Chapter 6. Data Analysis – Clustering

Clustering is the process of trying to make groups of objects that are more similar to each other than objects in other groups. Clustering is also called cluster analysis.

R has several tools to cluster your data (which we will investigate in this chapter):

  • K-means, including optimal number of clusters
  • Partitioning Around Medoids (PAM)
  • Bayesian hierarchical clustering
  • Affinity propagation clustering
  • Computing a gap statistic to estimate the number of clusters
  • Hierarchical clustering

Packages

For R, there are several packages available that provide clustering functionality for the programmer. We will use the following packages in the examples:

  • NbClust: This is the number of cluster indices
  • fpc: This contains flexible ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required