The Pearson correlation coefficient

The standard tool for comparing two numerical features is the Pearson correlation coefficient, commonly known simply as the correlation coefficient (there are many other correlation coefficients, but this one is by far the most popular). This is a numerical indication of the strength of the linear association between two numerical features.

I will repeat the keywords again: linear association. If the relationship between the features is nonlinear, then this correlation coefficient could be misleading, which is why it is always a good idea to take a look at both the scatter plot and the correlation coefficient.

Knowing the key characteristics of this coefficient will help us with the interpretation:

  • Its ...

Get Hands-On Predictive Analytics with Python now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.