Errata

The Site Reliability Workbook

Errata for The Site Reliability Workbook

Submit your own errata for this product.

The errata list is a list of errors and their corrections that were found after the product was released.

The following errata were submitted by our customers and have not yet been approved or disproved by the author or editor. They solely represent the opinion of the customer.

Color Key: Serious technical mistake Minor technical mistake Language or formatting error Typo Question Note Update

Version Location Description Submitted by Date submitted
Other Digital Version ~253
Figure 12-1

1. "kittens"
Point 2 says the search term is "kitten", point 4 says it is "kittens" (plural). These should be the same. Adjusting Point 2 to "kittens" makes it consistent with the rest of the figure.

2. CTR definition
Point 6 says the CTR = impressions/clicks, but it should instead be reversed: clicks/impression (x 100, but that may not be needed here).

Almer T.  Feb 05, 2021 
Chapter 12. Introducing Non-Abstract Large System Design
Figure 12-3. Sharding of logs with same query_id to duplicate shards

The box for Log Sharder contains a typo "hash(as_id)" should be "hash(ad_id)":

Lan Laucirica  Feb 12, 2021 
Other Digital Version 2
Table 2.2

In the book, "Site Reliability Workbook" -- chapter 2, Table 2.2 -- there is a mismatched parentheses..
It is there as:
sum(rate(http_requests_total{host="api", status!~"5.."}[7d]))

/

sum(rate(http_requests_total{host="api"}[7d])

but should be

sum(rate(http_requests_total{host="api", status!~"5.."}[7d]))

/

sum(rate(http_requests_total{host="api"}[7d]))

One closing paran is reqd for the rate function and one for sum function..





Gururaj Krishnan  Nov 18, 2020 
Printed, PDF Page 34
top (1st paragraph)

"Here we see that a single event consumed around 15% of the
error budget over the course of two days." - the graph shows ~30% error budget consumption.

Florian Rathgeber  Mar 15, 2021 
PDF Page 41
1st paragraph

So if the service offers a 99.9% availability (0.1% Error budget) and we replicate it, now that's in two zones it would offer 99.99% because (0.1 * 0.1 = 0.01; 100 - 0.01 = 99.99).

Regards.

Alejandro Colomina  Nov 13, 2018 
Printed Page 77
graph

The graph https://landing.google.com/sre/workbook/chapters/alerting-on-slos/#detection-time is incorrect. It is a lin-log graph, and they have wrongly omitted the first unit on the x scale which presumably is 0.1%. If that is the case, then the detection time for 0.1% is also incorrect, which should be 10, not ~8.5.

Karl Johan Grahn  May 30, 2019 
Printed Page 78
middle

The detection time equation is mentioned first in the section "Increased Alert Windows", which is too late and makes it very confusing on first reading. It is also incorrect since the equation holds for the graph in the previous section "Target Errror Rate greater than SLO Threshold".

Karl Johan Grahn  May 30, 2019 
Printed Page 79, 80
graphs

The graphs on the mentioned pages are incorrect, they have a logarithmic y-scale with zero, which is mathematically impossible. It should presumably be 0.01%, not 0.0%.

Karl Johan Grahn  May 30, 2019 
Printed Page 107
Last paragraph

Broken link to clos topology

Jens Heinrich  Jul 16, 2020 
PDF Page 209
The first action item in Table 10-5. Cleanup/miscellaneous

"bring the admin server backup" should be read as "bring the admin server back up"? i.e. backup vs. back up

Kazushige Hosokawa  Feb 19, 2019 
Printed, Other Digital Version Page 249
List item "ad_id"

The text

ad_id
Three 64-bit integers, 8 bytes

... could be read to imply that three 64-bit integers fit into 8 bytes. That is clearly false. It is probably better to clarify this by adding the word "each" like so:

ad_id
Three 64-bit integers, 8 bytes each

(Yes, I'm one of the authors of this chapter)

James Youngman  Jun 25, 2019 
Printed Page 257
Figure 12-3

it should say "ad_id::hash(ad_id)" instead of ad_id::hash(as_id)"

Jens Heinrich  Aug 14, 2020