Perform the following steps to perform cluster validation using the proteinIntake dataset:
- First, load the protein.csv file and do some preprocessing to scale and alter the row names:
> proteinIntake <- read.csv("protein.csv")> rownames(proteinIntake)=proteinIntake$Country> proteinIntake$Country=NULL> proteinIntakeScaled = as.data.frame(scale(proteinIntake))
- Compute the optimal number of clusters and visualize the result:
> nb <- NbClust(proteinIntakeScaled, distance = "euclidean", min.nc = 2,max.nc = 9, method = "ward.D2", index ="all")> fviz_nbclust(nb) + theme_minimal()
The following figure gives us an idea of the computations:
The following image gives us a plot of the number of clusters versus frequency:
- Next, ...