Errata

Database Internals

Errata for Database Internals

Submit your own errata for this product.

The errata list is a list of errors and their corrections that were found after the product was released.

The following errata were submitted by our customers and have not yet been approved or disproved by the author or editor. They solely represent the opinion of the customer.

Color Key: Serious technical mistake Minor technical mistake Language or formatting error Typo Question Note Update

Version Location Description Submitted by Date submitted
-
Chapter 7 Implementation Details Sorted String Tables 3rd paragraph

Since data files hold records in key order, using hashtables for indexing does not prevent us from implementing range scans, as a hashtable is only accessed to locate the first key in the range, and the range itself can be read from the data file sequentially while the range predicate still matches.

> hashtable is only accessed to locate the first key in the range

You can't access first key in the range via hashtable if it is not presented in the table.

Eugene Bikkinin  Dec 17, 2021 
Printed Page 17
First table

Figure 1-4

Column Family: anchor

timestamps should be t9 and t8

Prabhakhar Kaliyamurthy  Nov 08, 2021 
Printed Page 37
2nd paragraph of B-Tree Lookup Complexity

The logarithm base is the number of child pointers, not the number of keys.

Arjohn Kampman   Jul 21, 2021 
PDF Page 78
last paragraph, "B-Trees are characterized by their fanout: the number of keys stored in each node."

Under the sub heading "B-Tree Hierarchy" on the last paragraph begins with "B-Trees are characterized by their fanout: the number of keys stored in each node" which should actually be "B-Trees are characterized by their fanout: the number of allowed children per node"

Haile Lagi  Mar 20, 2024 
Printed Page 85
Last paragraph

Reference for Belady's anomaly is misspelled [BEDALY69] instead of [BELADY69] (both here, and on page 319 in the bibliography)

Chirag Singh  Mar 07, 2024 
Mobi Page 96
Figure 4-3

In Figure 4-3 the key K4 is missing in the right most node, after the split.

Daniel Rey  Jan 21, 2021 
PDF, ePub Page 98
98

I found small mistake in explanation of properties ensured by backward-oriented concurrency control: it says:

> the write set of T2 doesn't intersect with the read or write sets of T1

I think `T2` and `T1` should be switched.

Original paper, "On Optimistic Methods for Concurrency Control", also says:

> The write set of Ti does not intersect the read set or the write set of Tj, and
Ti completes its read phase before Tj completes its read phase.

Anonymous  Oct 24, 2020 
Printed Page 100
2nd paragraph

"However, write operations with a timestamp lower than max_write_timestamp are allowed, since we can safely ignore the outdated written values."
should be ->
"However, write operations with a timestamp higher than max_write_timestamp are allowed, since we can safely ignore the outdated written values." OR "However, write operations with a timestamp lower than max_write_timestamp are ignored, since we can safely ignore the outdated written values."

Filip Rydzi  Feb 13, 2021 
Printed Page 148
3rd-4th paragraphs

The book states in this section (regarding Bloom filters): "having more hash functions, we can check more bits and have a more precise outcome." And in the following paragraph: "computing results of more hash functions may have a negative performance impact, so we have to find a reasonable middle ground between acceptable probability and incurred overhead. Probability can be calculated from the expected set size."

Adding more hash functions doesn't always make the outcome more precise; there is a limit after which adding hash functions makes the outcome *less* precise - because more hash functions means more bits are flipped to 1 during construction, which increases the false positive rate. An optimum number of hash functions can be calculated given the filter size and set size.

Performance wouldn't be a consideration, with the possible exception of extremely small set sizes. In most cases, the optimum number of hash functions can be calculated and used.

"Probability can be calculated from the expected set size." should be "Probability can be calculated from the number of hash functions and the expected set size."

Stephen Cleary  Sep 06, 2022 
Printed Page 149
5th paragraph under Skiplist section

The second sentence says that if the node under consideration is greater than the search key, the predecessor's next level should be followed. However, this is identical to the behavior described for when the node is less than the search key.

Kevin Nygaard  Mar 08, 2022 
PDF Page 172
Second last paragraph.

"Using different consistency models, we can constraint or relax the number of states the system can be in."
Should be:
"Using different consistency models, we can constrain or relax the number of states the system can be in."

"constraint" is the noun. "constrain" is the verb.

Dawei Li  Aug 26, 2022 
PDF Page 174
Second line of the last paragraph.

"spending millions of dollars to reduce latency by several milliseconds to able to"
should be:
"spending millions of dollars to reduce latency by several milliseconds to be able to"

Dawei Li  Aug 26, 2022 
Printed Page 199
Section heading

“Phi-Accural Failure Detector” heading has a typo. It should be “Accrual”, instead of “Accural”.

Andriy Vityuk  Jan 01, 2023 
PDF Page 260
4th paragraph

The author named the first phase of the two-phase commit as "Prepare", while he mentioned this as "Propose Phase" in the illustration. This becomes even more obvious in the following Three-phase commit section. The author says: "3PC adds a prepare phase before the commit/abort step" (P264). So the author wrongly named the first phase in the two-phase commit as "Prepare", where it should be "Propose".

Xiaoye Wang  May 29, 2022 
PDF Page 265
3

The author explains how 3PC causes the "split brain" problem in a confusing way.
In the 4th paragraph of P265, the author said: "As soon as all the cohorts successfully move into the prepared state and the coordinator has received their prepare acknowledgments, the transaction will be committed if either side fails."
"the transaction will be committed if either side fails"? is there a causation relationship between "transaction committed" and "either side fails"? I can't see it from the explanation. From the illustration, it shows that the transaction will only be committed if the 2nd phase succeeds and moves to the third phase. And any failure in the process should cause an abort, not a commit.

Then in the last paragraph of this page, "Some nodes successfully move to the prepared state, and now can proceed with commit after the timeout." Is this part of the specification of timeout handling? Where are the sources of the explanation, I can see various explanations online, some say "timeout in the 2nd phase causes commit", and some say timeout in the 2nd phase causes abort. I think the author should explain why the "timeout" here causes commit because this is important in understanding the "split brain" problem in a network partition.

Xiaoye Wang  May 29, 2022 
ePub Page 337
2nd or 3rd (reading as epub)

Under section Distributed Transactions with Calvin, a sentence goes as "As soon as a transaction batch is successfully replicated, sequencer forwards it to the scheduler , which orchestrates transaction execution."

Replicated should be created since no replication is done at this stage, unless there are multiple sequencer nodes, something which isn't introduced until 2 pages later. Async replication can also be used between sequencers, making replication a prerequiste to scheduling non ideal is most cases

Rene Miguel Cudaihl  Dec 21, 2020