book

Causal Inference in Python

Name: Causal Inference in Python
Author: Matheus Facure
ISBN: 9781098140250

by Matheus Facure

July 2023

Beginner to intermediate

408 pages

12h 1m

English

O'Reilly Media, Inc.

Read now

Unlock full access

Preface
PrerequisitesOutlineConventions Used in This BookUsing Code ExamplesO’Reilly Online LearningHow to Contact UsAcknowledgments
I. Fundamentals
1. Introduction to Causal Inference
What Is Causal Inference?Why We Do Causal InferenceMachine Learning and Causal InferenceAssociation and CausationThe Treatment and the OutcomeThe Fundamental Problem of Causal InferenceCausal ModelsInterventionsIndividual Treatment EffectPotential OutcomesConsistency and Stable Unit Treatment ValuesCausal Quantities of InterestCausal Quantities: An ExampleBiasThe Bias EquationA Visual Guide to BiasIdentifying the Treatment EffectThe Independence AssumptionIdentification with RandomizationKey Ideas
2. Randomized Experiments and Stats Review
Brute-Force Independence with RandomizationAn A/B Testing ExampleThe Ideal ExperimentThe Most Dangerous EquationThe Standard Error of Our EstimatesConfidence IntervalsHypothesis TestingNull HypothesisTest Statisticp-valuesPowerSample Size CalculationKey Ideas
3. Graphical Causal Models
Thinking About CausalityVisualizing Causal RelationshipsAre Consultants Worth It?Crash Course in Graphical ModelsChainsForksImmorality or ColliderThe Flow of Association Cheat SheetQuerying a Graph in PythonIdentification RevisitedCIA and the Adjustment FormulaPositivity AssumptionAn Identification Example with DataConfounding BiasSurrogate ConfoundingRandomization RevisitedSelection BiasConditioning on a ColliderAdjusting for Selection BiasConditioning on a MediatorKey Ideas
II. Adjusting for Bias
4. The Unreasonable Effectiveness of Linear Regression
All You Need Is Linear RegressionWhy We Need ModelsRegression in A/B TestsAdjusting with RegressionRegression TheorySingle Variable Linear RegressionMultivariate Linear RegressionFrisch-Waugh-Lovell Theorem and OrthogonalizationDebiasing StepDenoising StepStandard Error of the Regression EstimatorFinal Outcome ModelFWL SummaryRegression as an Outcome ModelPositivity and ExtrapolationNonlinearities in Linear RegressionLinearizing the TreatmentNonlinear FWL and DebiasingRegression for DummiesConditionally Random ExperimentsDummy VariablesSaturated Regression ModelRegression as Variance Weighted AverageDe-Meaning and Fixed EffectsOmitted Variable Bias: Confounding Through the Lens of RegressionNeutral ControlsNoise Inducing ControlFeature Selection: A Bias-Variance Trade-OffKey Ideas
5. Propensity Score
The Impact of Management TrainingAdjusting with RegressionPropensity ScorePropensity Score EstimationPropensity Score and OrthogonalizationPropensity Score MatchingInverse Propensity WeightingVariance of IPWStabilized Propensity WeightsPseudo-PopulationsSelection BiasBias-Variance Trade-OffPositivityDesign- Versus Model-Based IdentificationDoubly Robust EstimationTreatment Is Easy to ModelOutcome Is Easy to ModelGeneralized Propensity Score for Continuous TreatmentKey Ideas
III. Effect Heterogeneity and Personalization
6. Effect Heterogeneity
From ATE to CATEWhy Prediction Is Not the AnswerCATE with RegressionEvaluating CATE PredictionsEffect by Model QuantileCumulative EffectCumulative GainTarget TransformationWhen Prediction Models Are Good for Effect OrderingMarginal Decreasing ReturnsBinary OutcomesCATE for Decision MakingKey Ideas

7. Metalearners
Metalearners for Discrete TreatmentsT-LearnerX-LearnerMetalearners for Continuous TreatmentsS-LearnerDouble/Debiased Machine LearningKey Ideas
IV. Panel Data
8. Difference-in-Differences
Panel DataCanonical Difference-in-DifferencesDiff-in-Diff with Outcome GrowthDiff-in-Diff with OLSDiff-in-Diff with Fixed EffectsMultiple Time PeriodsInferenceIdentification AssumptionsParallel TrendsNo Anticipation Assumption and SUTVAStrict ExogeneityNo Time Varying ConfoundersNo FeedbackNo Carryover and No Lagged Dependent VariableEffect Dynamics over TimeDiff-in-Diff with CovariatesDoubly Robust Diff-in-DiffPropensity Score ModelDelta Outcome ModelAll Together NowStaggered AdoptionHeterogeneous Effect over TimeCovariatesKey Ideas
9. Synthetic Control
Online Marketing DatasetMatrix RepresentationSynthetic Control as Horizontal RegressionCanonical Synthetic ControlSynthetic Control with CovariantsDebiasing Synthetic ControlInferenceSynthetic Difference-in-DifferencesDID RefresherSynthetic Controls RevisitedEstimating Time WeightsSynthetic Control and DIDKey Ideas
V. Alternative Experimental Designs
10. Geo and Switchback Experiments
Geo-ExperimentsSynthetic Control DesignTrying a Random Set of Treated UnitsRandom SearchSwitchback ExperimentPotential Outcomes of SequencesEstimating the Order of Carryover EffectDesign-Based EstimationOptimal Switchback DesignRobust VarianceKey Ideas
11. Noncompliance and Instruments
NoncomplianceExtending Potential OutcomesInstrument Identification AssumptionsFirst StageReduced FormTwo-Stage Least SquaresStandard ErrorAdditional Controls and Instruments2SLS by HandMatrix ImplementationDiscontinuity DesignDiscontinuity Design AssumptionsIntention to Treat EffectThe IV EstimateBunchingKey Ideas
12. Next Steps
Causal DiscoverySequential Decision MakingCausal Reinforcement LearningCausal ForecastingDomain AdaptationClosing Thoughts
Index
About the Author

Content preview from Causal Inference in Python

Chapter 4. The Unreasonable Effectiveness of Linear Regression

In this chapter you’ll add the first major debiasing technique in your causal inference arsenal: linear regression or ordinary least squares (OLS) and orthogonalization. You’ll see how linear regression can adjust for confounders when estimating the relationship between a treatment and an outcome. But, more than that, I hope to equip you with the powerful concept of treatment orthogonalization. This idea, born in linear regression, will come in handy later on when you start to use machine learning models for causal inference.

All You Need Is Linear Regression

Before you skip to the next chapter because “oh, regression is so easy! It’s the first model I learned as a data scientist” and yada yada, let me assure you that no, you actually don’t know linear regression. In fact, regression is one of the most fascinating, powerful, and dangerous models in causal inference. Sure, it’s more than one hundred years old. But, to this day, it frequently catches even the best causal inference researchers off guard.

OLS Research

Don’t believe me? Just take a look at some recently published papers on the topic and you’ll see. A good place to start is the article “Difference-in-Differences with Variation in Treatment Timing,” by Andrew Goodman-Bacon, or the paper “Interpreting OLS Estimands When Treatment Effects Are Heterogeneous” by Tymon Słoczyński, or even the paper “Contamination Bias in Linear Regressions” by Goldsmith-Pinkham ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

O’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.

Julian F.

Head of Cybersecurity

I wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.

Addison B.

Field Engineer

I’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.

Amir M.

Data Platform Tech Lead

I'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.

Mark W.

Embedded Software Engineer

Causal Inference and Discovery in Python

Publisher Resources

ISBN: 9781098140243Errata Page

Cloud Computing

Data Engineering

Data Science

AI & ML

Programming Languages

Software Architecture

IT/Ops

Security

Design