6.1 Theory of the correlation coefficient

6.1.1 Definitions

The standard measure of association between two random variables, which was first mentioned in Section 1.5 on ‘Means and Variances’, is the correlation coefficient

Unnumbered Display Equation

It is used to measure the strength of linear association between two variables, most commonly in the case where it might be expected that both have, at least approximately, a normal distribution. It is most important in cases where it is not thought that either variable is dependent on the other. One example of its use would be an investigation of the relationship between the height and the weight of individuals in a population, and another would be in finding how closely related barometric gradients and wind velocities were. You should, however, be warned that it is very easy to conclude that measurements are closely related because they have a high correlation, when, in fact, the relationship is due to their having a common time trend or a common cause and there is no close relationship between the two (see the relationship between the growth of money supply and Scottish dysentery as pointed out in a letter to The Times dated 6 April 1977). You should also be aware that two closely related variables can have a low correlation if the relationship between them is highly non-linear.

We suppose, then, that we have a set of n ordered pairs of observations, the pairs ...

Get Bayesian Statistics: An Introduction, 4th Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.