Errata

Errata for Causal Inference in Python

The errata list is a list of errors and their corrections that were found after the product was released. If the error was corrected in a later version or reprint the date of the correction will be displayed in the column titled "Date Corrected".

The following errata were submitted by our customers and approved as valid errors by the author or editor.

Color key: Serious technical mistake Minor technical mistake Language or formatting error Typo Question Note Update

Version Location Description Submitted By Date submitted Date corrected
Page Confidence intervals
A couple of pages in

On Kindle app, error in formula

Mistake: the factor...to get a 1-alpha confidence interval is given by abs(ppf((1-alpha)/2)))

Correction: the factor...to get a 1-alpha confidence interval is given by abs(ppf(alpha/2))

Note from the Author or Editor:
Right after “confidence interval is given by”, inside the formula, it should be ppf(alpha) instead of ppf(1-alpha). The following code is correct and needs no modification.

Francis Doornaert   Aug 17, 2023
Printed
Page Page 22, Section "A visual guide to Bias"
In the image right after "The reason for this is bias, which is depicted in the right plot:”

On the image, on the leftmost curly braces, the equation should be
E[Y|T=1] - E[Y|T=0], not E[Y|T=1] = E[Y|T=0].

Matthew Facure

Sep 01, 2023
Page Chapter 3, Section "Conditioning on a Collider"
Local 3042 of Kindle version, first formula

The formula states:

E[Y|T=1,R=1] - E[Y|T=1,R=1] = E[Y_1 - Y_0|R=1] + E[Y_0|T=0,R=1] - E[Y_0|T=1,R=1]

I think in the left side the second term should be E[T|T=0,R=1] since it is the average difference between treated and no treated who responded to the survey.

Note from the Author or Editor:
The formula should be

E[Y|T=1,R=1] - E[Y|T=0,R=1] = E[Y_1 - Y_0|R=1] + E[Y_0|T=1,R=1] - E[Y_0|T=0,R=1]

Felipe Frigeri  Sep 07, 2023
Page Chapter 11 RDD “The IV Estimate”
The code blocks

Firstly, it appears that the cutoff value used in the code is 10k, while it should actually be 5k. This has a downstream effect on the regression models and, consequently, the calculated LATE.

Secondly, the code implies that the ITTE can be directly derived from the conditional coefficient for the intercept in the linear regression. This would be a valid approach if the cutoff were at 0, but it's actually at 5k. This simplification seems to contradict the locality assumption of RDD, stating that the estimator is valid only near the threshold R=c.

Note from the Author or Editor:
In the section Intention to Treat Effect (pg 356 of the printed book), the paragraph right after the table should be updated to:

"Then, let's center the running variable, balance, to shift the threshold to zero. In this case, since the discontinuity is at 5000, you can do this by subtracting 5000 from the balance variable. (This is just a trick to make interpreting the regression parameters easier). Next, you need to regress the outcome variable on the centered running variable R interacted with a dummy for being above the threshold (R > 0):

y_i = \beta_0 + \beta_1 r_i + \beta_2 \mathbb{1}\{r_i>0\} + \beta_3 \mathbb{1}\{r_i>0\} r_i

The parameter estimate associated with crossing the threshold…"

Also, code block 20 should be:

m = smf.ols(f"pv~balance*I(balance>0)",
df_dd.assign(balance=lambda d: d["balance"]-5000)).fit()

m.summary().tables[1]

And table resulting from this code should be as in the updated code, cell 25:
https://github.com/matheusfacure/causal-inference-in-python-code/blob/main/causal-inference-in-python/11-Non-Compliance-and-Instruments.ipynb

In the section The IV Estimate, code block 21 should be updated to

def rdd_iv(data, y, t, r, cutoff):

centered_df = data.assign(**{r: data[r]-cutoff})

compliance = smf.ols(f"{t}~{r}*I({r}>0)", centered_df).fit()
itte = smf.ols(f"{y}~{r}*I({r}>0)", centered_df).fit()

param = f"I({r} > 0)[T.True]"
return itte.params[param]/compliance.params[param]

rdd_iv(df_dd, y="pv", t="prime_card", r="balance", cutoff=5000)

The result from this code block should also be updated to 732.8534752298891. See code block 27 in the GitHub link above.

Finally, the array just before the Bunching section should be updated to array([655.08214249, 807.83207567]). See code block 30 in the GitHub link above.

Alex Roy  Oct 30, 2023
Page 36
2nd Paragraph

The woman and man values should be switched to make sense with the rest of the paragraph.

"When you look at age, treatment groups seem very much alike, but there seems to be a difference in gender (woman = 1, man = 0)."

Note from the Author or Editor:
It should be "(woman = 0, man = 1)".

Clayton Schoeny  Jul 24, 2023
Page 42
1st Equation

In the equation for the estimate of the standard deviation, the summation should start at i=1, not i=0.

Note from the Author or Editor:
In the equation, it should be i=1, not i=0.

Clayton Schoeny  Jul 24, 2023
Page 48
Practical Example

The equation following "They report the efficacy of the vaccine," is not correct. It's printed as as E[Y|T = 0] / E[Y|T = 1], but this would give us a value of 56.5/3.3 = 17.12.

Rather, one way to correctly write the equation is 1 - (E[Y|T = 1] / E[Y|T = 0]).

Note from the Author or Editor:
The equation after "They report the efficacy of the vaccine" should be 1 - (E[Y|T = 1] / E[Y|T = 0]).

Clayton Schoeny  Jul 31, 2023
Printed
Page 58
Code cell 19

Missing a **2 in the code “np.ceil(16 * no_email.std()**2/0.01)”. It should be
“np.ceil(16 * no_email.std()**2/0.01**2)”, however, this gives a number too that is to large to go well with what is written around this code. A better solution is to replace the detectable difference from 1% to 8%.

“So, if you want to craft a cross-sell email experiment where you want to detect a 8% difference, like the one you saw in this conversion email example, you must have a sample size that gives you at least 8% = 2.8SE.
[...]

In [19]: np.ceil(16 * (no_email.std()/0.08)**2)

Out[19]: 103.0
"

Matthew Facure

Sep 01, 2023
Printed
Page 60
Last equation in the chapter.

In the equation right after “you could simplify the sample size formula to:”, there is a ^2 missing. It is
N = 16 * σ^2/δ
but it should be
N = 16 * σ^2/δ^2.

The correct equation can be found at page 58.

Matthew Facure

Sep 01, 2023
Printed
Page 97
“It projects all the X variables into the outcome dimension and makes the comparison between treatment and control on that projection.”

It should be “It projects the outcome variable into the X variables and makes the comparison between treatment and control on that projection.”

Matthew Facure

Sep 01, 2023
Page 151 (Conditioning on a collider)
After first paragraph

Left hand side of the formula contains an error which has already been submitted as an erratum by another reader (Felipe Frigeri)

But there is also an error in the right hand side, in the SelectionBias collection of terms:
E[Y_0|T=0, R=1] - E[Y_0|T=1, R=1]
should be corrected to
E[Y_0|T=1, R=1] - E[Y_0|T=0, R=1]

Note from the Author or Editor:
The right most term, above SelectionBias, should be E[Y_0|T=1, R=1] - E[Y_0|T=0, R=1]

Francis Doornaert   Sep 14, 2023
Page 433
Multiple Cohorts charts or code block

The example description and code snippet says the data is subset to the West region, but the example charts are labeled Multiple Cohorts - North Region

Note from the Author or Editor:
The 1st Plot in the Staggered Adoption section should read West instead of North. This was already fixed in the book's code, cell 42.
https://github.com/matheusfacure/causal-inference-in-python-code/blob/main/causal-inference-in-python/08-Difference-in-Differences.ipynb

Kara Downey  Sep 14, 2023