Skip to Content
R in a Nutshell
book

R in a Nutshell

by Joseph Adler
January 2010
Beginner
634 pages
19h 50m
English
O'Reilly Media, Inc.
Content preview from R in a Nutshell

Correlation and Covariance

Very often, when analyzing data, you want to know if two variables are correlated. Informally, correlation answers the question “when we increase (or decrease) x, does y increase (or decrease), and by how much?” Formally, correlation measures the linear dependence between two random variables. Correlation measures range between −1 and 1; 1 means that one variable is a (positive) linear function of the other, 0 means the two variables aren’t correlated at all, and −1 means that one variable is a negative linear function of the other (the two move in completely opposite directions; see Figure 16-1).

Correlation

Figure 16-1. Correlation

The most commonly used correlation measurement is the Pearson correlation statistic (it’s the formula behind the CORREL function in Excel):

Correlation

where is the mean of variable x, and ȳ is the mean of variable y. The Pearson correlation statistic is rooted in properties of the normal distribution and works best with normally distributed data. An alternative correlation function is the Spearman correlation statistic. Spearman correlation is a nonparametric statistic and doesn’t make any assumptions about the underlying distribution:

Correlation

Another measurement ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

R in a Nutshell, 2nd Edition

R in a Nutshell, 2nd Edition

Joseph Adler
The Big R-Book

The Big R-Book

Philippe J. S. De Brouwer
R Packages

R Packages

Hadley Wickham

Publisher Resources

ISBN: 9781449377502Supplemental ContentErrata Page