Errata

Understanding Compression

Errata for Understanding Compression

Submit your own errata for this product.

The errata list is a list of errors and their corrections that were found after the product was released.

The following errata were submitted by our customers and have not yet been approved or disproved by the author or editor. They solely represent the opinion of the customer.

Color Key: Serious technical mistake Minor technical mistake Language or formatting error Typo Question Note Update

Version Location Description Submitted by Date submitted
ePub Page 1
1

Hi,

I was wondering why you're bashing 'math' so hard in your book.

I am not a fan of demonizing math and you shouldn't convey the impression math is something bad and always hard to learn.

I always mentally discount books which tries to bash math or write that math is not really needed.

You should consider adjusting your 'math bashing' and substitute it with something more constructive like

You will learn some basic math in the e-book and for further studies there are some superb books which gives you a deeper inside in the compression theories.

Greetings
Tobias

Tobias Köck  May 12, 2016 
PDF Page 10
Para commencing "Since each....

the word in is repeated showing "in in our current....."

Biterg  Jun 24, 2016 
PDF Page 11
towards bottom

Binary to decimal description is WRONG.

= 9 + 1 = 10

Should be

= 8 + 2 = 10

Biterg  Jun 24, 2016 
Printed Page 15
paragraph text

"which interestingly enough, is exactly the binary representation"

It's not clear whether the result (10) is a coincidence caused by a poor selection of example values, or a deep consequence of the mathematics being demonstrated.

Eric Lawrence  Aug 12, 2016 
Printed Page 16
First equation

I haven't done "real" math in a long time, but isn't there an extra minus sign in:

log2(x) = -(log(x) / log(2))

It should instead be:

log2(x) = (log(x) / log(2))

right?

e.g. https://en.wikipedia.org/wiki/Binary_logarithm#Conversion_from_other_bases

Eric Lawrence  Aug 12, 2016 
ePub Page 17
about BWT

Hi,
in page 17 you say that: "Researchers were able to find that the BWT algorithm was the most efficient way to store DNA information in a compressed form"

But, BWT is not a compression algorithm.

In fact you say that in page 125, when you say: "This does not provide compression in-and-of-itself, but allows you to hand off the transformed stream to other compression systems."

So I think in page 17 you could write a little bit more about GENOME COMPRESSION.

In fact you could say about horizontal and vertical compression. And show up the major results. Another thing you could say is about READs(FASTQ) compression also called NGS sequencing data compression.

Congrats for your work.

Kelvin Kredens  Jun 13, 2016 
PDF Page 21
3rd paragraph

iPod was introduced in 2001, not 1998. I remember because the original iMac came out in ’98, and the iPod / CD-RW were after that, when Apple was pitching the Mac as the “digital hub” of our homes.

significant.bit  May 29, 2016 
Printed Page 21
Just before end of section

The phrase:

"(give or take a few qubits)."

seems like it was intended to be some sort of joke. Is that the case, or have I missed something somewhere along the way?

Given that, as the authors say, "math is hard" it seems inappropriate to include a misleading aside here.

Eric Lawrence  Aug 12, 2016 
PDF Page 23
caption for Figure 2-2

a couple of issues: “Lenna” and the phrase “full portrait:” repeated.

significant.bit  May 29, 2016 
PDF Page 30
1st paragraph

Hi, your book is really intearsting, the error I foun in page 30 is a minor error,

2^3 = 8
2^1=2

And you put

2^3=9
2^1=1

Anonymous  May 30, 2016 
PDF Page 31
definition

"information theory" is not bolded like other definitions

significant.bit  May 29, 2016 
PDF Page 34
3rd paragraph

The following sentence is incorrect:

Perhaps the simplest way to encode some text information would be to number all of
the English characters—A to Z—with numeric values 0–26.

It should be "0-25" at the end of the sentence.

Paul Dacus  Jul 04, 2017 
PDF Page 34
2nd paragraph

The following sentences are incorrect:

Perhaps the simplest way to encode some text information would be to number all of
the English characters—A to Z—with numeric values 0–26. You could then use the
number of pulses, along with pairing, to determine what digit you were transmitting.
For example, you could translate “THE HAT” into 20-8-5 8-1-20.

Either the numeric values need to be "1-26" not "0-26", or each numeric coded value at the end of the second sentence needs 1 subtracted from it; eg "19-7-4-7-0-19".

Paul Dacus  Jul 04, 2017 
PDF Page 37
footnote 3

implimentations

significant.bit  May 29, 2016 
PDF Page 38
1st paragraph

"This is set in the mathematical sense: a group of numbers..."

symbols or items might be a better choice than numbers

significant.bit  May 29, 2016 
PDF Page 38
1st line

The original sequence has 5 Ds, not 4 as described later in the example.

significant.bit  May 29, 2016 
PDF Page 38
near bottom of page

“… your Entropy for the set.” Unclear whether you’re talking about the whole dataset or the subset of symbols used [ABCD].

significant.bit  May 29, 2016 
PDF Page 40
after the definition

"To be practical and concrete, let’s start with a groupof letters, say:"

needs a space

"To be practical and concrete, let’s start with a group of letters, say:"

Anonymous  Jun 19, 2016 
PDF Page 42
mid page

“This type of transform is known as Delta Coding, or the process of encoding a series of numbers as the difference from the previous number.”

value might be a better word than number here

"... a series of values ... from the previous value."

significant.bit  May 29, 2016 
Printed Page 48
Elias Gamma table

The values for n=1 and n=2 don't seem to be right.

The code for n=2 should probably be "100", because you encode the value of 2 by doing (2^1 + 0). The code "101" would match n=3.

In the table on the prior page, it's explained that the number 0 is not representable in unary, which suggests that n=1 cannot use (2^0), which means that the encoding of "0" must be some kind of special case that isn't listed in the algorithm.

Eric Lawrence  Aug 15, 2016 
Printed Page 49
Explanation following Elias Delta table

The algorithm specified after "To decode Elias delta" seems like it would fail to correctly decode when the code is "0" as there wouldn't be any way to tell if the code word is done.

Following the encoding algorithm before the table, I encode n=1 as "10".

Eric Lawrence  Aug 15, 2016 
Printed Page 50
VarInt block

The algorithm here says that the "lower 7 bits are used to store the two's complement representation of the number", language mirrored in Google's docs (https://developers.google.com/protocol-buffers/docs/encoding#varints).

Mentioning "two's complement" rather than just "binary representation" implies that the scheme supports negative numbers, but it's not clear how that would work. After stripping the MSB from each byte, is the most-significant-remaining-bit of the last byte a flag indicating whether the number is negative? Or does this scheme not handle negative numbers and the use of the term "two's complement" is just misleading?

Eric Lawrence  Aug 15, 2016 
Printed Page 52
top of page

The table header block is unnecessarily repeated (widow) on the next page, presumably because the "a) More information on Elias omega..." text was accidentally widowed over to the next page. This is a bit confusing.

Eric Lawrence  Aug 15, 2016 
Printed Page 61
1st paragraph

When compression formats for video are mentioned, webM is also mentioned. But webM is a container format, not a compression codec.

Matteo Contrini  Dec 05, 2017 
PDF Page 64
4th paragraph in the section "Picking the Right Output Value"

It says that the encoding of the string "GGB" is 83, which needs log2(83) = 7 bits, which needs 1.42 bits per symbol. However, that calculation looks wrong. We need log2(83) / 3 = 2.12 bits per symbol.

Abhinav Upadhyay  Sep 23, 2023 
PDF Page 69
first line

On this first line
.. write n=2^(N + L) (where L =n-2^N)...

Should this be n = (2^N) + L like the example below on the same page?

Anonymous  Jun 21, 2016 
PDF Page 69
The example table

The example for n = 2 has 2^1+0 = 101

decode this the first part is B(10) = 2 so 2^2 . the remainder = B(01) = 1
then decoding 2^2 + 1 = 5

should n = 2 be B(01) = 2^0 + 1

Anonymous  Jun 25, 2016 
Printed Page 72
8th paragraph

On section "creating the reference table", the text explains that to calculate the values in a row, it is necessary to multiply the row number with the symbol probability. The correct operation is division, as also reported in the table on the next page.

Antonio  Jul 02, 2020 
Printed Page 86
Point 7

The adaptive VLC steps miss the one that should output the "B" character for the first time.

Matteo Contrini  Feb 04, 2018 
Printed Page 91
Second to last paragraph

WebM is again mentioned as a video compression codec/encoder/algorithm, but WebM is a container format

Matteo Contrini  Feb 04, 2018 
Printed Page 97
1st figure

The entropy of the two-symbol set should be 0.97, not 2.2, making it vastly superior to the other tokenization options. Less critically, the entropy of the 1st figure on the facing page (p96) is also incorrect: it should be 2.42, not 2.38.

Kevin Nygaard  Oct 02, 2023 
Printed Page 151
1st line

"context missing" should be "context mixing"

Matteo Contrini  Feb 13, 2018 
Printed Page 157
second sentence

"GZIP, BZIP, and now" should be "GZIP, and now"

For most of the history of HTTP, DEFLATE and its wrapper GZIP were the only supported means of compression. BZIP2 is not a part of the "Standard HTTP stack" -- the only browser I'm aware of that ever supported BZIP2 was an early version of Chrome, and that code was ripped out long ago.

Today, everybody does GZIP/DEFLATE, Chrome+Firefox+Opera do Brotli, Chrome+Opera do SDCH, and Opera does lzma.

Eric Lawrence  Aug 04, 2016 
Printed Page 168
image block

Image quality in the black and white printed text is so low that the graphic loses all ability to convey "degrading quality"; each version looks essentially identical.

Eric Lawrence  Aug 12, 2016 
Printed Page 170
image block

Image quality in the black and white printed text is so low that the graphic loses all ability to convey a distinction between 128 and 32 colors; each version looks essentially identical.

Eric Lawrence  Aug 12, 2016 
Printed Page 197
last word on page

the word "battery" should instead be "radio". The radio being on is what drains the battery. The battery is always "on".

Eric Lawrence  Aug 04, 2016 
Printed Page 200
first sentence

"have a valid computing experience"

It's not clear what is meant by "valid" here. Maybe use "good" or "compelling" or any similar word.

Eric Lawrence  Aug 04, 2016