Errata for Designing Data-Intensive Applications
Submit your own errata for this product.
The errata list is a list of errors and their corrections that were found after the product was released. If the error was corrected in a later version or reprint the date of the correction will be displayed in the column titled "Date Corrected".
The following errata were submitted by our customers and approved as valid errors by the author or editor.
Color Key: Serious Technical Mistake Minor Technical Mistake Language or formatting error Typo Question Note Update
Version |
Location |
Description |
Submitted By |
Date Submitted |
Date Corrected |
Safari Books Online |
ch04
"Dynamically Generated Schemas", 2nd paragraph |
In the text below:
[...] problems with textual formats (JSON, CSV, SQL)
"SQL" is obviously not a textual format. In the context, the author was probably referring to "XML".
The resulting fixed line would be:
[...] problems with textual formats (JSON, CSV, XML)
Note from the Author or Editor: Erratum is correct, I have corrected the text in Atlas
|
Punleuk Oum |
Apr 30, 2018 |
Jun 01, 2018 |
Safari Books Online |
ch04
"code generation and dynamically typed languages", 3rd paragraph |
"[...] code generation is an unnecessarily obstacle to getting to the data."
->
"[...] code generation is an unnecessary obstacle to getting to the data."
Note from the Author or Editor: I have made this change in Atlas
|
Punleuk Oum |
Apr 30, 2018 |
Jun 01, 2018 |
Safari Books Online |
ch 6
references |
Reference [11] Andrew Wang: “Windows Azure Storage,” umbrant.com, February 4, 2016. should link to https://www.umbrant.com/2016/02/04/windows-azure-storage/
Note from the Author or Editor: URL of the blog post has changed. We're updating it to https://www.umbrant.com/2016/02/04/windows-azure-storage/
|
David Waller |
Oct 01, 2018 |
Nov 21, 2018 |
Safari Books Online |
Ch 11
references |
Reference [18] Jay Kreps, Neha Narkhede, and Jun Rao: “Kafka: A Distributed Messaging System for Log Processing,” is no longer available at that URL. Suggested alternative: https://www.microsoft.com/en-us/research/wp-content/uploads/2017/09/Kafka.pdf
Note from the Author or Editor: I have updated the URL in Atlas and on https://github.com/ept/ddia-references
|
David Waller |
Nov 30, 2018 |
Mar 15, 2019 |
PDF |
Page x
Top |
New types of database [system] (“NoSQL”)
have been getting lots of attention, but message queues, caches, search indexes, frameworks
s/b systems
Note from the Author or Editor: Fixed in next Early Release update.
|
Anonymous |
Aug 10, 2015 |
Mar 01, 2017 |
Safari Books Online |
Chapter 1 |
In this Chapter 1, we will start by exploring the fundamentals of what we are trying to achieve: reliabile, scalabile and maintainable data systems
reliabile -> reliable
scalabile -> scalable
Note from the Author or Editor: Fixed in next Early Release update.
|
Sascha Gottfried |
Sep 23, 2015 |
Mar 01, 2017 |
Safari Books Online |
Ch 4
Chapter 4, section CODE GENERATION AND DYNAMICALLY TYPED LANGUAGES |
compilation is written with a typo as compliation.
Code generation is often frowned upon in these languages, since they otherwise avoid an explicit compliation step.
Note from the Author or Editor: Fixed in next early release update.
|
Philippe Derome |
May 23, 2016 |
Mar 01, 2017 |
Safari Books Online |
Ch 4
|
following choice of words feels awkward (unpack): In the rest of this chapter we will unpack some of the most common ways how data flows between processes:
It would seem that reveal, show,or describe would be a more common fit than unpack.
Note from the Author or Editor: Fixed in next early release update.
|
Philippe Derome |
May 23, 2016 |
Mar 01, 2017 |
Safari Books Online |
5
Chapter 8 |
In the sub-section "The truth is defined by the majority" of section "Knowledge, Truth and Lies", a typo in the paragraph below figure 8-5:
However, the storage server rembers that it has already processed a write with a higher token number (34), and so it rejects the request with token 33.
"rembers " -> "remembers"?
Note from the Author or Editor: Fixed in next early release update.
|
Anonymous |
Nov 10, 2016 |
Mar 01, 2017 |
PDF |
Page 7
2nd Paragraph |
"forexample, randomly killing individual processes without warning — is known as chaosmonkey "
I don't think that it is correct to use 'chaos monkey' as an umbrella term in this context, chaos monkey is a software application developed to preform that task not a term with that meaning.
Note from the Author or Editor: Fixed in next Early Release update.
|
blankenshipz |
Jun 30, 2015 |
Mar 01, 2017 |
PDF |
Page 15
Box 'Percentiles in Practice', First paragraph, last sentence |
"As it takes just one slow call to make the entire end-user request slow, rare slow calls to the backend become much more frequent at the end-user request level" should probably be reworded.
1. Saying "the backend" can be misleading, and is technically inaccurate. It's *multiple* backends. It's only with multiple backends that this increase in frequency of slow end-user requests makes sense.
2. It would be more technically accurate and better to say that it is the *collective frequency* of any slow call to any backend that increases. The slow calls themselves do not exactly increase in frequency.
|
Jeffrey 'jf' Lim |
Dec 19, 2015 |
Mar 01, 2017 |
PDF |
Page 26
2nd paragraph |
"data model" should be pluralized in 'There are many different kinds of data model' (should be 'There are many different kinds of data models')
|
Jeffrey 'jf' Lim |
Dec 19, 2015 |
Mar 01, 2017 |
Printed |
Page 50
1st paragraph |
"Besides these, there are also imperative graph query languages such as Gremlin..."
I believe that Gremlin supports both imperative and declarative traversals. The wikipedia page is actually a useful reference here: https://en.wikipedia.org/wiki/Gremlin_(programming_language)
Note from the Author or Editor: Correct, the declarative features seem to have been added to Gremlin since I last looked at it. I will reword this sentence to avoid the confusion.
|
Jeff Carpenter |
Dec 19, 2017 |
Mar 16, 2018 |
Safari Books Online |
53
Designing Data-Intensive Applications Chapter 2. The Battle of the Data Models |
... where the the database ...
better written with just one 'the'
Note from the Author or Editor: Fixed in next Early Release update.
|
Klaus Ita |
Aug 06, 2015 |
Mar 01, 2017 |
PDF |
Page 53
last paragraph |
The following sentence has a typo:
"In a graph database, there are is no such restriction: any vertex can have an edge to any other vertex."
It should be:
"In a graph database, there is no such restriction: any vertex can have an edge to any other vertex."
|
Slavcho Slavchev |
Jan 23, 2016 |
Mar 01, 2017 |
PDF |
Page 73
last paragraph of this page |
"The merging and compaction of frozen segments can be done in a background thread...continue to serve read and write requests as normal, using the old segment files"
what does frozen mean is a little vague. How to write old segment files when it has been frozen or close?
Note from the Author or Editor: Rephrasing this sentence to be clearer in the next update.
|
yuxh |
May 04, 2019 |
Aug 09, 2019 |
PDF |
Page 76
2nd-3rd paragraphs |
TLDR for the below comments: The second and third paragraphs downplay the differences with Bitcask, which was pretty confusing to me at first.
Unless I'm mistaken, this sentence ("We also require that each key only appears once within each merged segment file (the compaction process already ensures that).") would be more accurately/helpfully written ("We also require that each key only appears once within each segment file. Incoming keys are consolidated by a tree structure that we will discuss shortly.")
This resolves my first confusion, because I thought you were implying that segment files could have multiple entries per key until they were merged.
Also, the sentence "At first glance, that requirement seems to break our ability to use sequential writes, but we’ll get to that in a moment" might be more helpfully written "This means that we cannot append new keys directly to the segments (as in Bitcask) because we can only have one entry per key/segment. However, creation of new segment files is still performed using sequential writes, as we will show in a moment."
This resolves my second confusion, because I thought you were implying that you would show how all new data could still be written directly to the segments.
Note from the Author or Editor: Thanks for the suggested wording improvements. In the next update of the book I will tweak the wording to avoid this confusion.
|
Stephen Dewey |
Sep 10, 2018 |
Nov 21, 2018 |
Safari Books Online |
82
3rd paragraph from the bottom |
Swagger is mentioned as a RESTful APIs description language, however this information is not exactly correct nor full. Swagger is an old name (since Nov 5 2015, however still used informally). The current official name is "OpenAPI"(https://www.openapis.org). Swagger is an API documentation tool and even though it is designed for RESTful APIs, it is also used as an interactive documentation tool for other types of HTTP APIs.
Moreover Swagger/Open API is not the only tool for API documentation and design. Other popular tools include:
* RAML (http://raml.org/)
* API Blueprint (https://apiblueprint.org)
Note from the Author or Editor: Making an appropriate change in QC1 review, to be included in QC2.
|
Andrzej Jarzyna |
Feb 03, 2017 |
Mar 01, 2017 |
PDF |
Page 87
3rd paragraph |
"increasinly" should be "increasingly"
Note from the Author or Editor: Fixed in next Early Release update.
|
Greg Nofi |
Nov 05, 2015 |
Mar 01, 2017 |
PDF |
Page 107
line 5 |
Is it right?
"From time to time to time", I think it is mis typo of "From time to time"
Note from the Author or Editor: Remove spurious "to time".
|
DaeMyung Kang |
Dec 12, 2014 |
Mar 01, 2017 |
PDF |
Page 137
Under heading |
There are two common ways how data is distributed across multiple nodes:
/del "how"
Note from the Author or Editor: Fixed in next Early Release update.
|
Anonymous |
Aug 10, 2015 |
Mar 01, 2017 |
PDF |
Page 138
"Distributed actor frameworks" section |
The "Distributed Actor Frameworks" section is missing important background information. It doesn't really describe why we would want to use such a framework, and it doesn't explain how the frameworks can still be useful despite the potential for lost messages. To make this section useful, I think it would be worth adding a paragraph or two to address these points.
Note from the Author or Editor: [No change in this edition]
We have noted this suggestion and will take it into account when preparing a second edition of the book.
|
Stephen Dewey |
Sep 17, 2018 |
Nov 21, 2018 |
PDF |
Page 190-191
Final paragraph ("However, if you want to allow...") |
It would help to address how tombstones help with deletes during concurrent writes (not just how it helps with cleaning up siblings after the fact). In the shopping cart example, if the 4th write was "delete milk, delete eggs, add ham" and a tombstone was added indicating that milk and eggs were deleted at version 4, you would still have milk and eggs coming back in the next write at version 5 (based on version 3).
The question then is whether the database assumes that milk and eggs were only included in version 5 because they were part of version 3 (in which case it could delete them now) or whether the database assumes the user was reaffirming that they wanted milk and eggs (in which case the new write should overwrite the tombstone). It doesn't seem like there's an easy answer because there isn't enough information to really know what the intent was.
Note from the Author or Editor: [No change in this edition]
We have noted this suggestion and will take it into account when preparing a second edition of the book.
|
Stephen Dewey |
Oct 15, 2018 |
Nov 21, 2018 |
Printed |
Page 202
2nd paragraph |
After figure 6-2, the text states that Volume 12 of the pictured encyclopedia (Trudeau - Zywiec) contains "words starting with T, U, V, X, Y, and Z." However, assuming that the encyclopedia uses the English alphabet, it would also contain words starting with W.
|
Milo Price |
Dec 28, 2017 |
Mar 16, 2018 |
Printed |
Page 203
5th paragraph |
Book states "Cassandra and MongoDB use MD5", Cassandra uses murmur3 hashing though.
Note from the Author or Editor: Cassandra prior to version 1.2 used MD5, and version 1.2 switched to using Murmur3 by default. I will clarify this in the text.
|
Ulf Gitschthaler |
Jun 26, 2017 |
Mar 16, 2018 |
ePub |
Page 222
2nd paragraph |
"each partitions maintains..." should be "each partition maintains..."
Note from the Author or Editor: Fixed in next Early Release update.
|
Anonymous |
Oct 29, 2015 |
Mar 01, 2017 |
Printed |
Page 227
Citation 19 |
Re: SSDs losing power in just weeks in unusual temps.
The citation does say this, but itself refers to a presentation slide that JEDEC has called misunderstood: https://www.jedec.org/news/pressreleases/jedec-update-solid-state-drive-standard
While the point is certainly valid that SSDs can lose data in storage, the very short time frames given are talking about EOL'ed enterprise drives. Perhaps a footnote would help for expanding on this alarming statistic.
Excellent book by the way, really enjoying it!
Note from the Author or Editor: Thank you for pointing this out; I will update the wording to clarify this point in the next update of the book.
|
Corey Sciuto |
Aug 26, 2018 |
Nov 21, 2018 |
PDF |
Page 241
full page prior to "Indexes and snapshot isolation" |
If you have the time, I'm wondering if you can shed some light on this. I found the discussion in this section to be very confusing.
I think the source of my confusion is that you haven't explained how, if at all, uncommitted rows are kept separate from the list of committed rows in the object version lists that you have shown. Reading between the lines I think the answer is that they are NOT kept separate at all, so an uncommitted write goes immediately into the same list as committed transactions list. Is that true?
That would explain why in rule #1 on page 241 you say writes by transactions which were running at the beginning a snapshot transaction are ignored "even if" any of those writes commits. You say "even if" because a transaction could also see the uncommitted writes of earlier transactions that are still running. It needs the list of "transactions that were running when I started" to know that either of the following is true: 1) this transaction is still running and was running when I started, 2) this transaction has committed or aborted (but isn't cleaned up yet) and was running when I started. In both cases it ignores the row.
#2 is also confusing because it seems like a superfluous rule after #1 and #3. Does a transaction need some mechanism to determine "rows from aborted transactions" in addition to rules 1+3? The only way I can think to resolve this is that the assumption in my second paragraph is correct (uncommitted rows are not kept separate) and that additionally, it takes some time to clear aborted uncommitted rows from the object version list (and to unmark objects as deleted which were deleted by aborted transactions). Therefore the transaction needs some second list of "aborted transactions which were not cleared when I started" so it can know to ignore them. Is this true?
The two paragraphs on page 239 from "To implement snapshot isolation" ending in "for an entire transaction" are also pretty confusing. In the first paragraph you say that it's a generalization of the earlier mechanism (which wasn't fully explained, you just said that "any writes by a transaction only become visible to others when that transaction commits"). But then in the second paragraph you imply that MVCC is effectively a distinct mechanism. But the bigger confusion is with that final sentence ("A typical approach"). Why would it make sense to ever base MVCC on a single query? Even read committed is done at the transaction level, not the query level.
Thanks in advance for any clarification you can provide.
Note from the Author or Editor: [No change in this edition]
We have noted this suggestion and will take it into account when preparing a second edition of the book.
|
Stephen Dewey |
Sep 28, 2018 |
Nov 21, 2018 |
PDF |
Page 242
first two paragraphs |
Similar to my earlier question, I think a key missing piece of information here is where this alternative approach places uncommitted writes. It is really hard to understand how this approach is meant to work without knowing that.
Also, I think you probably meant to put these two paragraphs in their own section, because they don't have anything to do with the last header ("Indexes and snapshot isolation").
Note from the Author or Editor: [No change in this edition]
We have noted this suggestion and will take it into account when preparing a second edition of the book.
The section structure is correct. The last two paragraphs describe the copy-on-write approach to maintaining B-tree indexes, which can help with implementing snapshot isolation by using an old B-tree root as the snapshot from which a transaction reads. We will try to make this clearer in the second edition.
|
Stephen Dewey |
Sep 28, 2018 |
Nov 21, 2018 |
Printed |
Page 249
Entire page |
Page 349 appears instead of page 249 on page 249. There is no page 249 content to be found in the book.
Page 349 displays correctly, though is duplicated in two places as a result.
Note from the Author or Editor: this was a printer error in the 4th printing, but has been corrected since then (6th printing was March 2019).
|
Anonymous |
Jul 01, 2019 |
Mar 15, 2019 |
Printed |
Page 253
2nd paragraph |
Under the billeted list outlining developments that caused a rethink:
RAM became cheap enough that for many use cases is now feasible to keep....
"is now" should read "it is" or "it's".
|
Simon McClive |
Apr 15, 2017 |
Mar 16, 2018 |
PDF |
Page 257
second bullet point in the middle |
You refer to figure 7-1, but figure 7-1 doesn't portray a case of "reading an old version of an object" as you say. Both reads in that figure happen before any writes occur. Perhaps you meant to refer to a different figure.
Also on page 258, remove the "a" from before the word "having".
Note from the Author or Editor: Changing the reference to Figure 7-4 instead of Figure 7-1.
|
Stephen Dewey |
Oct 03, 2018 |
Nov 21, 2018 |
PDF |
Page 281
3rd paragraph |
"packed-switched" in "Ethernet and IP are packed-switched protocols" should be "packet-switched"
Note from the Author or Editor: Will be fixed in QC1
|
Krzysztof Sobusiak |
Jan 02, 2017 |
Mar 01, 2017 |
PDF |
Page 288
3rd-to-last paragraph |
"These jumps, as well as the fact that they often ignore leap seconds, make time-of-day clocks unsuitable for measuring elapsed time"
Based on the reference you linked, it seems the CloudFlare problem was actually that the clock used by its code DID take leap seconds into account, but the application code ignored the fact that this could happen.
So perhaps a better phrasing would be:
"These jumps, as well as similar jumps caused by leap seconds, make time-of-day clocks unsuitable for measuring elapsed time"
In other words the problem isn't that time-of-day clocks ignore leap seconds, it's the reverse, that they are affected by them. But then the application code ignores the fact that this can happen.
Note from the Author or Editor: I agree with the suggested wording change, and have updated the text in Atlas.
|
Stephen |
Dec 03, 2018 |
Mar 15, 2019 |
PDF |
Page 293
(Sixth Early Release) Ch8, The leader and the lock, 2nd paragraph |
Minor problems with plurals:
...even if a nodes believes that it is... [change 'nodes' to 'node']
... mean the majority of nodes agrees! ... [change 'agrees' to 'agree']
Note from the Author or Editor: Fixed in next early release update.
|
Ross |
Aug 13, 2016 |
Mar 01, 2017 |
PDF |
Page 302
(Sixth Early Release) Ch8, Summary, 4th paragraph |
The wording feels a bit awkward in - 'The only way how information can flow...'
Perhaps drop 'how'?
'The only way information can flow ...'
Note from the Author or Editor: Fixed in next early release update.
|
Ross |
Aug 13, 2016 |
Mar 01, 2017 |
PDF |
Page 317
4th paragraph |
"...are easier use correctly" should be "...are easier to use correctly"
Note from the Author or Editor: Fixed in copyedit
|
Krzysztof Sobusiak |
Jan 05, 2017 |
Mar 01, 2017 |
Printed |
Page 322
2nd full paragraph, 2nd sentence |
Unnecessary repeat of word 'first' in same sentence, keeping the first one and suppressing the second one would do:
But first we first need to explore the range of guarantees...
|
Philippe Derome |
Apr 25, 2017 |
Mar 16, 2018 |
ePub |
Page 452
Chapter 8, Figured 8-1 |
Figure 8-1 seems to be a duplicate of Figure 2-4, and does not match the description of what 8-1 is trying to communicate.
|
Donald Kjer |
Jan 24, 2016 |
Mar 01, 2017 |
Mobi |
Page 3131
text |
TYPO:
"commiting the write" should be "committing the write"
Redundancy:
"The blocking of readers and writers is implemented by a having a lock on each object in the database." Should be: "implemented by having a lock on "
Note from the Author or Editor: Fixed in Atlas.
|
Anonymous |
Dec 16, 2019 |
Jan 24, 2020 |
Mobi |
Page 6130
|
Hard to tell because I use kindle, thus I don't see pages but locations
in location 6130 and the first paragraph you see a repeated "to"
"When a transaction wants to to commit."
Note from the Author or Editor: Fixed in next Early Release update.
|
Wilmer Andres Daza Gomez |
Aug 26, 2015 |
Mar 01, 2017 |
Mobi |
Page 10672
throughout |
Notes from Amazon
Your book has an external links that do not work. For example at the following locations "1789,2763,2780,2816,5030" and throughout the book. For example "Apache CouchDB 1.6." Documentation. Please update a valid external URL.To ensure future access to reference material, Amazon strongly recommends submitting these types of links to an archive service, and including the archived link in the book. If the link is broken due to forces outside your control, it should be deactivated and “[URL inactive]” should be added following the link text."
Note from the Author or Editor: I have gone through all URLs in the book and fixed all broken links as of March 2020.
|
Anonymous |
Jan 22, 2020 |
Mar 27, 2020 |
|