Errata

Errata for Database Internals

Submit your own errata for this product.

The errata list is a list of errors and their corrections that were found after the product was released.

The following errata were submitted by our customers and have not yet been approved or disproved by the author or editor. They solely represent the opinion of the customer.

Color Key: Serious technical mistake Minor technical mistake Language or formatting error Typo Question Note Update

Version	Location	Description	Submitted by	Date submitted
	- Chapter 7 Implementation Details Sorted String Tables 3rd paragraph	Since data files hold records in key order, using hashtables for indexing does not prevent us from implementing range scans, as a hashtable is only accessed to locate the first key in the range, and the range itself can be read from the data file sequentially while the range predicate still matches. > hashtable is only accessed to locate the first key in the range You can't access first key in the range via hashtable if it is not presented in the table.	Eugene Bikkinin	Dec 17, 2021
Printed	Page 17 First table	Figure 1-4 Column Family: anchor timestamps should be t9 and t8	Prabhakhar Kaliyamurthy	Nov 08, 2021
Printed	Page 37 2nd paragraph of B-Tree Lookup Complexity	The logarithm base is the number of child pointers, not the number of keys.	Arjohn Kampman	Jul 21, 2021
PDF	Page 78 last paragraph, "B-Trees are characterized by their fanout: the number of keys stored in each node."	Under the sub heading "B-Tree Hierarchy" on the last paragraph begins with "B-Trees are characterized by their fanout: the number of keys stored in each node" which should actually be "B-Trees are characterized by their fanout: the number of allowed children per node"	Haile Lagi	Mar 20, 2024
Printed	Page 85 Last paragraph	Reference for Belady's anomaly is misspelled [BEDALY69] instead of [BELADY69] (both here, and on page 319 in the bibliography)	Chirag Singh	Mar 07, 2024
Mobi	Page 96 Figure 4-3	In Figure 4-3 the key K4 is missing in the right most node, after the split.	Daniel Rey	Jan 21, 2021
PDF, ePub	Page 98 98	I found small mistake in explanation of properties ensured by backward-oriented concurrency control: it says: > the write set of T2 doesn't intersect with the read or write sets of T1 I think `T2` and `T1` should be switched. Original paper, "On Optimistic Methods for Concurrency Control", also says: > The write set of Ti does not intersect the read set or the write set of Tj, and Ti completes its read phase before Tj completes its read phase.	Anonymous	Oct 24, 2020
Printed	Page 100 2nd paragraph	"However, write operations with a timestamp lower than max_write_timestamp are allowed, since we can safely ignore the outdated written values." should be -> "However, write operations with a timestamp higher than max_write_timestamp are allowed, since we can safely ignore the outdated written values." OR "However, write operations with a timestamp lower than max_write_timestamp are ignored, since we can safely ignore the outdated written values."	Filip Rydzi	Feb 13, 2021
Printed	Page 148 3rd-4th paragraphs	The book states in this section (regarding Bloom filters): "having more hash functions, we can check more bits and have a more precise outcome." And in the following paragraph: "computing results of more hash functions may have a negative performance impact, so we have to find a reasonable middle ground between acceptable probability and incurred overhead. Probability can be calculated from the expected set size." Adding more hash functions doesn't always make the outcome more precise; there is a limit after which adding hash functions makes the outcome less precise - because more hash functions means more bits are flipped to 1 during construction, which increases the false positive rate. An optimum number of hash functions can be calculated given the filter size and set size. Performance wouldn't be a consideration, with the possible exception of extremely small set sizes. In most cases, the optimum number of hash functions can be calculated and used. "Probability can be calculated from the expected set size." should be "Probability can be calculated from the number of hash functions and the expected set size."	Stephen Cleary	Sep 06, 2022
Printed	Page 149 5th paragraph under Skiplist section	The second sentence says that if the node under consideration is greater than the search key, the predecessor's next level should be followed. However, this is identical to the behavior described for when the node is less than the search key.	Kevin Nygaard	Mar 08, 2022
PDF	Page 172 Second last paragraph.	"Using different consistency models, we can constraint or relax the number of states the system can be in." Should be: "Using different consistency models, we can constrain or relax the number of states the system can be in." "constraint" is the noun. "constrain" is the verb.	Dawei Li	Aug 26, 2022
PDF	Page 174 Second line of the last paragraph.	"spending millions of dollars to reduce latency by several milliseconds to able to" should be: "spending millions of dollars to reduce latency by several milliseconds to be able to"	Dawei Li	Aug 26, 2022
Printed	Page 199 Section heading	“Phi-Accural Failure Detector” heading has a typo. It should be “Accrual”, instead of “Accural”.	Andriy Vityuk	Jan 01, 2023
PDF	Page 260 4th paragraph	The author named the first phase of the two-phase commit as "Prepare", while he mentioned this as "Propose Phase" in the illustration. This becomes even more obvious in the following Three-phase commit section. The author says: "3PC adds a prepare phase before the commit/abort step" (P264). So the author wrongly named the first phase in the two-phase commit as "Prepare", where it should be "Propose".	Xiaoye Wang	May 29, 2022
PDF	Page 265 3	The author explains how 3PC causes the "split brain" problem in a confusing way. In the 4th paragraph of P265, the author said: "As soon as all the cohorts successfully move into the prepared state and the coordinator has received their prepare acknowledgments, the transaction will be committed if either side fails." "the transaction will be committed if either side fails"? is there a causation relationship between "transaction committed" and "either side fails"? I can't see it from the explanation. From the illustration, it shows that the transaction will only be committed if the 2nd phase succeeds and moves to the third phase. And any failure in the process should cause an abort, not a commit. Then in the last paragraph of this page, "Some nodes successfully move to the prepared state, and now can proceed with commit after the timeout." Is this part of the specification of timeout handling? Where are the sources of the explanation, I can see various explanations online, some say "timeout in the 2nd phase causes commit", and some say timeout in the 2nd phase causes abort. I think the author should explain why the "timeout" here causes commit because this is important in understanding the "split brain" problem in a network partition.	Xiaoye Wang	May 29, 2022
ePub	Page 337 2nd or 3rd (reading as epub)	Under section Distributed Transactions with Calvin, a sentence goes as "As soon as a transaction batch is successfully replicated, sequencer forwards it to the scheduler , which orchestrates transaction execution." Replicated should be created since no replication is done at this stage, unless there are multiple sequencer nodes, something which isn't introduced until 2 pages later. Async replication can also be used between sequencers, making replication a prerequiste to scheduling non ideal is most cases	Rene Miguel Cudaihl	Dec 21, 2020