# Chapter 9. Correlation

# Standard Scores

In this chapter, we look at relationships between variables. For example, we have a sense that height is related to weight; people who are taller tend to be heavier. Correlation is a description of this kind of relationship.

A challenge in measuring correlation is that the variables we want to compare might not be expressed in the same units. For example, height might be in centimeters and weight in kilograms. And even if they are in the same units, they come from different distributions.

There are two common solutions to these problems:

Transform all values to standard scores. This leads to the Pearson coefficient of correlation.

Transform all values to their percentile ranks. This leads to the Spearman coefficient.

If *X* is a series of values,
*x** _{i}*, we
can convert to standard scores by subtracting the mean and dividing by
the standard deviation:
z

*= (x*

_{i}*−*

_{i}*μ*) /

*σ*.

The numerator is a deviation: the distance from the mean. Dividing
by *σ* normalizes
the deviation, so the values of *Z* are
dimensionless (no units) and their distribution has mean 0 and variance
1.

If *X* is normally distributed, so is
*Z*; but if *X* is skewed or
has outliers, so does *Z*. In those cases, it is more
robust to use percentile ranks. If *R* contains the
percentile ranks of the values in *X*, the
distribution of *R* is uniform between 0 and 100,
regardless of the distribution of *X*.

# Covariance

Covariance is a measure of
the tendency of two variables to vary together. If we have two series,
*X* and ...

Get *Think Stats* now with the O’Reilly learning platform.

O’Reilly members experience live online training, plus books, videos, and digital content from nearly 200 publishers.