Chapter 19. Clustering: Dendrograms and Heat Maps

Clustering

Clustering refers to a number of related methods for exploring multivariate data. There are dozens of clustering functions available in R. We will focus on just one of them in this chapter: the hclust() function in base R. This function performs hierarchical clustering, which is one of the most commonly used clustering techniques and will be a good introduction to clustering in general. The idea is to put observations into clusters, or groups, in which the members of a single cluster are similar to each other and different from observations in other clusters. Further, a particular cluster may be judged to be similar, in varying degrees, to other clusters. We will use a graph called the dendrogram—which looks like an inverted tree—to understand the relationships of clusters to one another. Figure 19-2, later in this chapter, presents an example of a dendrogram.)

Consider the mtcars dataset from Motor Trend Magazine’s 1974 report on the characteristics of a number of new models for that year. Let’s take a look at the first six rows of this dataset by using the head() function:

 > head(mtcars) mpg cyl disp hp drat wt qsec vs am Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 gear ...

Get Graphing Data with R now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Graphing Data with R by John Jay Hilfiger

Chapter 19. Clustering: Dendrograms and Heat Maps

Clustering

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly