Chapter 25

Multivariate Statistics

This class of statistical methods is fundamentally different from the others in this book because there is no response variable. Instead of trying to understand variation in a response variable in terms of explanatory variables, in multivariate statistics we look for structure in the data. The problem is that structure is rather easy to find, and all too often it is a feature of that particular data set alone. The real challenge is to find general structure that will apply to other data sets as well. Unfortunately, there is no guaranteed means of detecting pattern, and a great deal of ingenuity has been shown by statisticians in devising means of pattern recognition in multivariate data sets. The main division is between methods that assume a given structure and seek to divide the cases into groups, and methods that seek to discover structure from inspection of the dataframe. The really important point is that you need to know exactly what the question is that you are trying to answer. Do not mistake the opaque for the profound.

The multivariate techniques implemented in R include:

  • principal components analysis (prcomp);
  • factor analysis (factanal);
  • cluster analysis (hclust, kmeans);
  • discriminant analysis (lda, qda);
  • neural networks (nnet).

These techniques are not recommended unless you know exactly what you are doing, and exactly why you are doing it. Beginners are sometimes attracted to multivariate techniques because of the complexity of ...

Get The R Book, 2nd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.