CHAPTER 37Does Smoking Cause Lung Cancer?

I am sure everyone reading this book believes that smoking causes lung cancer. Doctors and statisticians followed a long and winding road before this conclusion became generally accepted. As late as 1960, only one third of all U.S. doctors believed that smoking caused lung cancer (see www.ncbi.nlm.nih.gov/pubmed/22345227). In this chapter, we will show you how the world became convinced that smoking causes lung cancer.

Correlation and Causation Redux

As first discussed in Chapter 8, “Modeling Relationships Between Two Variables,” two quantitative variables, X and Y, have a correlation that is always between –1 and +1. Correlation measures the strength of the linear relationship between X and Y. A correlation near +1 tells us that when X is larger (smaller) than average, Y tends to be larger (smaller) than average. A correlation near –1 tells us that when X is larger (smaller) than average, then Y tends to be smaller (larger) than average. Many people believe that a correlation near +1 or –1 implies a causal relationship between X and Y. This is often untrue. The website www.tylervigen.com/spurious-correlations contains many examples of highly correlated variables for which there is surely no cause effect relationship. For example, between 2000 and 2009, there is a 0.9926 correlation between the divorce rate in Maine and per capita consumption of margarine! Sometimes a third variable can explain a correlation. For example, if we let ...

Get Analytics Stories now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.