O'Reilly logo

Graphing Data with R by John Jay Hilfiger

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Chapter 19. Clustering: Dendrograms and Heat Maps

Clustering

Clustering refers to a number of related methods for exploring multivariate data. There are dozens of clustering functions available in R. We will focus on just one of them in this chapter: the hclust() function in base R. This function performs hierarchical clustering, which is one of the most commonly used clustering techniques and will be a good introduction to clustering in general. The idea is to put observations into clusters, or groups, in which the members of a single cluster are similar to each other and different from observations in other clusters. Further, a particular cluster may be judged to be similar, in varying degrees, to other clusters. We will use a graph called the dendrogram—which looks like an inverted tree—to understand the relationships of clusters to one another. Figure 19-2, later in this chapter, presents an example of a dendrogram.)

Consider the mtcars dataset from Motor Trend Magazine’s 1974 report on the characteristics of a number of new models for that year. Let’s take a look at the first six rows of this dataset by using the head() function:

 > head(mtcars) mpg cyl disp hp drat wt qsec vs am Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 gear ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required