Chapter 8. Difference-in-Differences

After discussing treatment effect heterogeneity, it’s time to switch gears a bit, back into average treatment effects. Over the next few chapters, you’ll learn how to leverage panel data for causal inference.

A panel is a data structure that has repeated observations across time. The fact that you observe the same unit in multiple time periods allows you to see, for the same unit, what happens before and after a treatment takes place. This makes panel data a promising alternative to identifying the causal effects when randomization is not possible. When you have observational (nonrandomized) data and the likely presence of unobserved confounders, panel data methods are as good as it gets in terms of properly identifying the treatment effect.

In this chapter, you’ll see why panel data is so interesting for causal inference. Then, you’ll learn the most famous causal inference estimator for panel data: difference-in-differences—and many variations of it. To keep things interesting, you’ll do all of this in the context of figuring out the effect of an offline marketing campaign.

Data Regimes

In contrast to panel data or longitudinal design, cross-sectional data is characterized by each unit appearing only once. A third category, which falls between the two, is known as repeated cross-sectional data. This type of data involves multiple time entries, but the units in each entry are not necessarily the same. Up until this point, you have worked with ...

Get Causal Inference in Python now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.