Chapter 5. Using Causal Diagrams to Deconfound Data Analyses

Cause-and-effect relationships are so fundamental to our understanding of the world that even a kindergartner intuitively grasps them. However, that intuition—and our data analyses—can be led astray by confounding, as we saw in Chapter 1. If we don’t account for a joint cause of our two variables of interest, then we’ll misinterpret what’s happening and the regression coefficient for our cause of interest will be biased. However, we also saw the risks of taking into account the wrong variables. This makes determining which variables to include or not one of the most crucial questions in deconfounding data analyses and, more broadly, causal thinking.

It is alas a complicated question, with various authors proposing various rules that are more or less expansive. On the more expansive end of the spectrum, you have rules that err towards caution and simplicity—you can think of them as reasoned “everything and the kitchen sink” approaches. On the other end of the spectrum, you have rules that try to zero in on the exact variables required and nothing else, but at the cost of higher complexity and conceptual requirements.

Interestingly, answering that question doesn’t require any data. That is, you may want or need data to build the right CD, but once you have a CD that is correct, you don’t need to look at any data to identify confounding. This puts us squarely on the CD-to-behaviors edge of our framework (Figure 5-1) and ...

Get Behavioral Data Analysis with R and Python now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.