The idea of principal components analysis (PCA) is to find a small number of **linear combinations** of the variables so as to capture most of the variation in the dataframe as a whole. With a large number of variables it may be easier to consider a small number of combinations of the original data rather than the entire dataframe. Suppose, for example, that you had three variables measured on each subject, and you wanted to distil the essence of each individual's performance into a single number. An obvious solution is the arithmetic mean of the three numbers 1/3*v*_{1} + 1/3*v*_{2} + 1/3*v*_{3} where *v*_{1}, *v*_{2} and *v*_{3} are the three variables (e.g. maths score, physics score and chemistry score for pupils' exam results). The vector of coefficients *l* = (1/3, 1/3, 1/3) is called a linear combination. Linear combinations where Σ *l*^{2} = 1 are called standardized linear combinations. Principal components analysis finds a set of orthogonal standardized linear combinations which together explain all of the variation in the original data. There are as many principal components as there are variables, but typically it is only the first few that explain important amounts of the total variation.

Calculating principal components is easy. Interpreting what the components mean in scientific terms is hard, and potentially equivocal. You need to be more than usually circumspect when evaluating multivariate statistical analyses.

The following dataframe contains mean dry weights (g) for 54 ...

Start Free Trial

No credit card required