How to do it...

Perform the following steps to perform cluster validation using the proteinIntake dataset:

  1. First, load the protein.csv file and do some preprocessing to scale and alter the row names:
> proteinIntake <- read.csv("protein.csv")> rownames(proteinIntake)=proteinIntake$Country> proteinIntake$Country=NULL> proteinIntakeScaled = as.data.frame(scale(proteinIntake))
  1. Compute the optimal number of clusters and visualize the result:
> nb <- NbClust(proteinIntakeScaled, distance = "euclidean", min.nc = 2,max.nc = 9, method = "ward.D2", index ="all")> fviz_nbclust(nb) + theme_minimal()

The following figure gives us an idea of the computations:

The following image gives us a plot of the number of clusters versus frequency:

  1. Next, ...

Get R Data Analysis Cookbook - Second Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.