book

Causal Inference in Python

Name: Causal Inference in Python
Author: Matheus Facure
ISBN: 9781098140250

by Matheus Facure

July 2023

Beginner to intermediate

408 pages

12h 1m

English

O'Reilly Media, Inc.

Read now

Unlock full access

Preface
PrerequisitesOutlineConventions Used in This BookUsing Code ExamplesO’Reilly Online LearningHow to Contact UsAcknowledgments
I. Fundamentals
1. Introduction to Causal Inference
What Is Causal Inference?Why We Do Causal InferenceMachine Learning and Causal InferenceAssociation and CausationThe Treatment and the OutcomeThe Fundamental Problem of Causal InferenceCausal ModelsInterventionsIndividual Treatment EffectPotential OutcomesConsistency and Stable Unit Treatment ValuesCausal Quantities of InterestCausal Quantities: An ExampleBiasThe Bias EquationA Visual Guide to BiasIdentifying the Treatment EffectThe Independence AssumptionIdentification with RandomizationKey Ideas
2. Randomized Experiments and Stats Review
Brute-Force Independence with RandomizationAn A/B Testing ExampleThe Ideal ExperimentThe Most Dangerous EquationThe Standard Error of Our EstimatesConfidence IntervalsHypothesis TestingNull HypothesisTest Statisticp-valuesPowerSample Size CalculationKey Ideas
3. Graphical Causal Models
Thinking About CausalityVisualizing Causal RelationshipsAre Consultants Worth It?Crash Course in Graphical ModelsChainsForksImmorality or ColliderThe Flow of Association Cheat SheetQuerying a Graph in PythonIdentification RevisitedCIA and the Adjustment FormulaPositivity AssumptionAn Identification Example with DataConfounding BiasSurrogate ConfoundingRandomization RevisitedSelection BiasConditioning on a ColliderAdjusting for Selection BiasConditioning on a MediatorKey Ideas
II. Adjusting for Bias
4. The Unreasonable Effectiveness of Linear Regression
All You Need Is Linear RegressionWhy We Need ModelsRegression in A/B TestsAdjusting with RegressionRegression TheorySingle Variable Linear RegressionMultivariate Linear RegressionFrisch-Waugh-Lovell Theorem and OrthogonalizationDebiasing StepDenoising StepStandard Error of the Regression EstimatorFinal Outcome ModelFWL SummaryRegression as an Outcome ModelPositivity and ExtrapolationNonlinearities in Linear RegressionLinearizing the TreatmentNonlinear FWL and DebiasingRegression for DummiesConditionally Random ExperimentsDummy VariablesSaturated Regression ModelRegression as Variance Weighted AverageDe-Meaning and Fixed EffectsOmitted Variable Bias: Confounding Through the Lens of RegressionNeutral ControlsNoise Inducing ControlFeature Selection: A Bias-Variance Trade-OffKey Ideas
5. Propensity Score
The Impact of Management TrainingAdjusting with RegressionPropensity ScorePropensity Score EstimationPropensity Score and OrthogonalizationPropensity Score MatchingInverse Propensity WeightingVariance of IPWStabilized Propensity WeightsPseudo-PopulationsSelection BiasBias-Variance Trade-OffPositivityDesign- Versus Model-Based IdentificationDoubly Robust EstimationTreatment Is Easy to ModelOutcome Is Easy to ModelGeneralized Propensity Score for Continuous TreatmentKey Ideas
III. Effect Heterogeneity and Personalization
6. Effect Heterogeneity
From ATE to CATEWhy Prediction Is Not the AnswerCATE with RegressionEvaluating CATE PredictionsEffect by Model QuantileCumulative EffectCumulative GainTarget TransformationWhen Prediction Models Are Good for Effect OrderingMarginal Decreasing ReturnsBinary OutcomesCATE for Decision MakingKey Ideas

7. Metalearners
Metalearners for Discrete TreatmentsT-LearnerX-LearnerMetalearners for Continuous TreatmentsS-LearnerDouble/Debiased Machine LearningKey Ideas
IV. Panel Data
8. Difference-in-Differences
Panel DataCanonical Difference-in-DifferencesDiff-in-Diff with Outcome GrowthDiff-in-Diff with OLSDiff-in-Diff with Fixed EffectsMultiple Time PeriodsInferenceIdentification AssumptionsParallel TrendsNo Anticipation Assumption and SUTVAStrict ExogeneityNo Time Varying ConfoundersNo FeedbackNo Carryover and No Lagged Dependent VariableEffect Dynamics over TimeDiff-in-Diff with CovariatesDoubly Robust Diff-in-DiffPropensity Score ModelDelta Outcome ModelAll Together NowStaggered AdoptionHeterogeneous Effect over TimeCovariatesKey Ideas
9. Synthetic Control
Online Marketing DatasetMatrix RepresentationSynthetic Control as Horizontal RegressionCanonical Synthetic ControlSynthetic Control with CovariantsDebiasing Synthetic ControlInferenceSynthetic Difference-in-DifferencesDID RefresherSynthetic Controls RevisitedEstimating Time WeightsSynthetic Control and DIDKey Ideas
V. Alternative Experimental Designs
10. Geo and Switchback Experiments
Geo-ExperimentsSynthetic Control DesignTrying a Random Set of Treated UnitsRandom SearchSwitchback ExperimentPotential Outcomes of SequencesEstimating the Order of Carryover EffectDesign-Based EstimationOptimal Switchback DesignRobust VarianceKey Ideas
11. Noncompliance and Instruments
NoncomplianceExtending Potential OutcomesInstrument Identification AssumptionsFirst StageReduced FormTwo-Stage Least SquaresStandard ErrorAdditional Controls and Instruments2SLS by HandMatrix ImplementationDiscontinuity DesignDiscontinuity Design AssumptionsIntention to Treat EffectThe IV EstimateBunchingKey Ideas
12. Next Steps
Causal DiscoverySequential Decision MakingCausal Reinforcement LearningCausal ForecastingDomain AdaptationClosing Thoughts
Index
About the Author

Content preview from Causal Inference in Python

Chapter 8. Difference-in-Differences

After discussing treatment effect heterogeneity, it’s time to switch gears a bit, back into average treatment effects. Over the next few chapters, you’ll learn how to leverage panel data for causal inference.

A panel is a data structure that has repeated observations across time. The fact that you observe the same unit in multiple time periods allows you to see, for the same unit, what happens before and after a treatment takes place. This makes panel data a promising alternative to identifying the causal effects when randomization is not possible. When you have observational (nonrandomized) data and the likely presence of unobserved confounders, panel data methods are as good as it gets in terms of properly identifying the treatment effect.

In this chapter, you’ll see why panel data is so interesting for causal inference. Then, you’ll learn the most famous causal inference estimator for panel data: difference-in-differences—and many variations of it. To keep things interesting, you’ll do all of this in the context of figuring out the effect of an offline marketing campaign.

Data Regimes

In contrast to panel data or longitudinal design, cross-sectional data is characterized by each unit appearing only once. A third category, which falls between the two, is known as repeated cross-sectional data. This type of data involves multiple time entries, but the units in each entry are not necessarily the same. Up until this point, you have worked with ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

O’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.

Julian F.

Head of Cybersecurity

I wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.

Addison B.

Field Engineer

I’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.

Amir M.

Data Platform Tech Lead

I'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.

Mark W.

Embedded Software Engineer

Causal Inference and Discovery in Python

Publisher Resources

ISBN: 9781098140243Errata Page

Cloud Computing

Data Engineering

Data Science

AI & ML

Programming Languages

Software Architecture

IT/Ops

Security

Design

Business

Soft Skills

Causal Inference in Python

by Matheus Facure

Chapter 8. Difference-in-Differences

Data Regimes

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.