9Correlation

In this chapter, we look at relationships between variables. After completing this chapter, you should be able to:

  • Define what we mean by correlation
  • Determine the statistical significance of apparent correlation by resampling
  • Calculate the correlation coefficient
  • Assess whether correlation suggests causation

Think back to the “no-fault” study of errors in hospitals. In an experiment, we found that introducing a no-fault reporting system reduced the number of serious errors. We found there was a relationship between one variable—a type of reporting system—and another variable—a reduction in errors.

“Type of reporting system” is a binary variable. It has just two values: “regular” and “no-fault.” Often, you may want to determine whether there is a relationship involving the amount of something, not just whether it is “on” or “off.”

For example, is there a relationship between employee training and productivity? Training is expensive, and organizations need to know not simply whether training helps but also how much it helps.

Get Statistics for Data Science and Analytics now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.