Errata

Cassandra: The Definitive Guide

Errata for Cassandra: The Definitive Guide

Submit your own errata for this product.

The errata list is a list of errors and their corrections that were found after the product was released. If the error was corrected in a later version or reprint the date of the correction will be displayed in the column titled "Date Corrected".

The following errata were submitted by our customers and approved as valid errors by the author or editor.

Color key: Serious technical mistake Minor technical mistake Language or formatting error Typo Question Note Update

Version Location Description Submitted By Date submitted Date corrected
NA
NA

As this is on Safari, I can't list the page number or location on the page.

In chapter 4, "The Cassandra Query Language", section "Secondary Indexes", there is the following statement:

"Note that for maps in particular, we have the option of indexing either the keys, via the syntax KEYS(addresses), or the values, which is the default. You may not create indexes on both the keys and values of a map."

This may have been true for the version used when writing this book, but it's not the case for the most recent version of Cassandra, and it's not clear exactly how far back this was not the case.

I have not personally verified this, but I cite the following from the #cassandra IRC channel:
------------------
davidmichaelkarr> David M. Karr Question about indexes. I found the following statement in "C:TDG": "Note that for maps in particular, we have the option of indexing either the keys, via the syntax KEYS(addresses), or the values, which is the default. You may not create indexes on both the keys and values of a map."
11:51:05 → nosukker joined (268c303a@gateway/web/cgi-irc/kiwiirc.com/ip.38.140.48.58)
11:51:41
<davidmichaelkarr> David M. Karr Is that saying you can't create a single index covering both the keys and the values, or is it saying that a particular map column cannot have both its keys and values indexed?
11:52:44
<thobbs> Tyler Hobbs davidmichaelkarr: I think it's saying both, but not all of that is necessarily true in recent C* versions
11:52:57 first, you can create an index on key-value pairs
11:53:09 and second, iirc, in 3.0+ you can index both keys and values separately
11:53:27 that might have actually been added in some later 3.x, but I think it's true in 3.0
11:53:36
<davidmichaelkarr> David M. Karr thobbs: Ok, so it's just not accurate for recent versions then. Ok.
11:53:59
<thobbs> Tyler Hobbs right
------------------

Note from the Author or Editor:
The change to allow both keys and values to be indexed appears to have been added in the 2.2 release (compare https://docs.datastax.com/en/cql/3.3/cql/cql_using/useIndexColl.html with https://docs.datastax.com/en/cql/3.1/cql/ddl/ddlIndexColl.html)

Please change the text:

"Note that for maps in particular, we have the option of indexing either the keys, via the syntax KEYS(addresses), or the values, which is the default. You may not create indexes on both the keys and values of a map."

to

"Note that for maps in particular, we have the option of indexing either the keys, via the syntax KEYS(addresses), or the values (which is the default), or both (in Cassandra 2.2 or later)."

David M. Karr  Sep 19, 2016  Apr 07, 2017
NA
NA

I don't even know if it makes sense to report this error here, but on Safari books, the "URL" button on each page is supposed to produce a URL that should return to the same page, assuming the user is logged into Safari. This works perfectly fine on all the Safari books I've read in the past. For some reason, the URLs that I get for pages in this book only return to the table of contents of the book.

I just tested this again for two other books, and it still works fine. The URLs produced for this book just return to the TOC.

Note from the Author or Editor:
Confirmed that the links from individual pages in Safari Books Online do not work.

David M. Karr  Sep 22, 2016 
Printed
Chapter 5, Calculating Size on Disk

Confirmed Errata for previous section ("Calculating Partition Size") says that number of hotels should not be considered in partition size. Subsequent section (Calculating Size on Disk) also needs to be corrected. Number of hotels should have no effect on partition size limit nor partition size calculation. Data for given hotel would be stored on different partition.

Note from the Author or Editor:
Reader is correct, since there is a partition for each hotel, number of hotels is not a factor, and this should flow through calculations in both sections.

Nr=100rooms/hotel×730days=73,000rows

and

Partition size = 16 bytes + 0 bytes + 0.51 MB + 0.58 MB = 1.1 MB

Anonymous  Mar 30, 2017  Apr 07, 2017
Printed
Page 2
5th paragraph, first sentence

The sentence reads:

In his 1970 paper "A Relational Model of Data for Large Shared Data Banks," Dr. Edgar F. Codd, also at advanced his theory of the relational model for data while working at IBM's San Jose research laboratory.

The "at" before "advanced" appears to be a typo.

Note from the Author or Editor:
The phrase "also at" should be revised to "also at IBM,".

In his 1970 paper "A Relational Model of Data for Large Shared Data Banks," Dr. Edgar F. Codd, also at IBM, advanced his theory of the relational model for data while working at IBM's San Jose research laboratory.

David Maldonado  Apr 23, 2017 
PDF
Page 64
1st code block

The instructions below doesn't really update the row.
The last name and the timestamp value remain unchanged.
Cassandra silently ignores UPDATE instruction and on SELECT displays old value.


cqlsh:my_keyspace> UPDATE user USING TIMESTAMP 1434373756626000
SET last_name = 'Boateng' WHERE first_name = 'Mary' ;
cqlsh:my_keyspace> SELECT first_name, last_name,
WRITETIME(last_name) FROM user WHERE first_name = 'Mary';


cqlsh> SHOW VERSION;
[cqlsh 5.0.1 | Cassandra 3.7 | CQL spec 3.4.2 | Native protocol v4]

Note from the Author or Editor:
The comment is correct, the issue is that using the timestamp provided in the book is by definition in the past compared to the time you are running the example. If you use the timestamp provided in the book, the update will be considered an earlier change than your write. Therefore the update is valid, and no error message is generated.

The text that reads as follows:

To do this, we’ll use the CQL UPDATE command for the first time, using the
optional USING TIMESTAMP option:

Should be modified to the following:

To do this, we’ll use the CQL UPDATE command for the first time. We'll use the
optional USING TIMESTAMP option to manually set a timestamp (note that the timestamp must be later than the one from our SELECT command, or the UPDATE will be ignored):

Ihor Mochurad  Aug 19, 2016  Apr 07, 2017
Printed
Page 66
tinyint definition

The tinyint Cassandra type is defined as

tinyint: An 8-bit signed integer (as in Java)

Since Java does not have a primitive type called tinyint, for consistency with the other type definitions it would be better to define it as

tinyint: An 8-bit signed integer (equivalent to a Java byte)


Note from the Author or Editor:
Good recommendation. Please change as recommended to:

tinyint: An 8-bit signed integer (equivalent to a Java byte)

Paolo Baronti  Mar 10, 2018 
PDF
Page 70
2nd paragraph of inet block

The collapsed version of inet is printed as 2001:db8:85a3:a::8a2e:370:7334.
But tested on Cassandra 3.10, it is displayed as 2001:db8:85a3::8a2e:370:7334 (with double colon in the middle).
Wondering why is it not 2001:db8:85a3:::8a2e:370:7334 (with triple colon in the middle)?

Note from the Author or Editor:
There is an extra "a" in the encoded address that should be removed. The corrected text should read as follows:

"... so the preceding value is rendered as follows when read using SELECT: 2001:db8:85a3:::8a2e:370:7334."

Antonius Sopian  Apr 24, 2017 
PDF
Page 74
3

Now that we have defined our address type, we’ll try to use it in our user table, but if
you’re using Cassandra 2.1 or earlier, you’ll run into a problem:
cqlsh:my_keyspace> ALTER TABLE user ADD
addresses map<text, address>;
InvalidRequest: code=2200 [Invalid query] message="Non-frozen
collections are not allowed inside collections: map<text,
address>"

I am facing this issue in version 3.7 version of Cassandra, when the book reads that the version must be <= 2.1
cqlsh:my_keyspace> SHOW VERSION;
[cqlsh 5.0.1 | Cassandra 3.7 | CQL spec 3.4.2 | Native protocol v4]

Note from the Author or Editor:
This problem does still exist in releases through 3.7, see https://issues.apache.org/jira/browse/CASSANDRA-7826.

The reference to release 2.1 should be omitted, and the text modified to say: "Now that we have defined our address type, we’ll try to use it in our user table, but we immediately run into a problem"

Ihor Mochurad  Aug 19, 2016  Apr 07, 2017
PDF
Page 75

The "create" table is listed like so

```
CREATE TABLE my_keyspace.user (
first_name text PRIMARY KEY,
addresses map<text, frozen<address>>,
emails set<text>,
id uuid,
last_name text,
login_sessions map<timeuuid, int>,
phone_numbers list<text>,
title text
) WITH bloom_filter_fp_chance = 0.01
AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
AND comment = ''
AND compaction = {'min_threshold': '4', 'class':
'org.apache.cassandra.db.compaction.
SizeTieredCompactionStrategy', 'max_threshold': '32'}
AND compression = {'sstable_compression':
'org.apache.cassandra.io.compress.LZ4Compressor'}
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99.0PERCENTILE';
```

However, this throws two errors.

- caching needs to be a map
- map keys need to be single quoted

The string that actually works is the following. Note the changes to "caching"

```
CREATE TABLE my_keyspace.user (
first_name text PRIMARY KEY,
addresses map<text, frozen<address>>,
emails set<text>,
id uuid,
last_name text,
login_sessions map<timeuuid, int>,
phone_numbers list<text>,
title text
) WITH bloom_filter_fp_chance = 0.01
AND caching = {'keys':'ALL', 'rows_per_partition':'NONE'}
AND comment = ''
AND compaction = {'min_threshold': '4', 'class':
'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32'}
AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'}
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99.0PERCENTILE';
```

Note from the Author or Editor:
Reader is correct. The line describing the caching settings should have single quotes rather than double quotes:

AND caching = {'keys':'ALL', 'rows_per_partition':'NONE'}

Raju Gandhi  Feb 08, 2017  Apr 07, 2017
PDF
Page 76
1st code block


cqlsh:my_keyspace> SELECT * FROM user WHERE last_name = 'Nguyen';
InvalidRequest: code=2200 [Invalid query] message="No supported
secondary index found for the non primary key columns restrictions"

Book says that after attempting to fetch values by the last name, user will see error message as following: code=2200 [Invalid query] message="No supported
secondary index found for the non primary key columns restrictions"
In reality it slightly differs:


InvalidRequest: code=2200 [Invalid query] message="Cannot execute this query as it might involve data filtering and thus may have unpredictable performance. If you want to execute this query despite the performance unpredictability, use ALLOW FILTERING"

cqlsh:my_keyspace> SHOW VERSION;
[cqlsh 5.0.1 | Cassandra 3.7 | CQL spec 3.4.2 | Native protocol v4]

Note from the Author or Editor:
Since the book includes many output examples, it’s inevitable that many of the statement formats will undergo slight changes in future versions of Cassandra.

The preface of the book should note more explicitly that examples were run against Cassandra 3.0.

Specifically, the following section should be revised:

Cassandra Versions Used in This Book
This book was developed using the Cassandra 3.X series of releases, along with the DataStax Java Driver version 3.0.

This statement should be revised to:

This book was developed using Apache Cassandra 3.0 and the DataStax Java Driver version 3.0. The formatting and content of tool output, log files, configuration files, and error messages are as they appear in the 3.0 release, and may change in future releases.

Ihor Mochurad  Aug 19, 2016  Apr 07, 2017
PDF
Page 78
Summary paragraph

The link at the bottom of the page to the full language specification is broken.

https://cassandra.apache.org/doc/cql3/CQL.html

Not Found

The requested URL /doc/cql3/CQL.html was not found on this server.

Note from the Author or Editor:
The Cassandra team has recently reorganized the documentation available at the Apache site (after publication of the book).

The new URL for the CQL specification is: http://cassandra.apache.org/doc/latest/cql/index.html

Ihor Mochurad  Aug 19, 2016  Apr 07, 2017
Printed
Page 97
4th paragraph and following formula

Author received the following comment: "Do you think that calculation on the page 97, section, calculating the partition size is right? I believe, number of values is calculated for a partition rather than entire rows. In your calculation you are calculating for 5000 hotels, but each hotel is a partition (available_room_by_hotel_date). So I believe your assumption is wrong. Could you please clarify the calculation? I refer the datastax online video tutorial, it also says about rows per partition rather than either rows."

The reader is correct, the number of hotels should not be included when calculating the number of rows per partition for this table, because each hotel will have its own partition.

Therefore the text should read as follows:

So the number of values for this table is equal to the number of rows. We still need to determine a number of rows. To do this, we make some estimates based on the application we’re designing. Our table is storing a record for each room, in each of our hotels, for every night. Let’s assume that our system will be used to store two years of inventory at a time, and there are 5,000 hotels in our system, with an average of 100 rooms in each hotel.

This leads an estimated number of rows as follows:
Nr = 100 rooms/hotel X 730 days = 73,000 rows

This relatively small number of rows per partition is not going to get us in too much trouble, but if we start adding a lot of hotels or don’t manage the size of our inventory well using TTL, we could start having issues. We still might want to look at breaking up this large partition, which we’ll do shortly.

Jeffrey Carpenter
Jeffrey Carpenter
 
Aug 22, 2016  Apr 07, 2017
Printed
Page 97
1st paragraph

N_pk should be the number of partition key columns instead of the number of primary key columns. Otherwise, clustering keys are not considered in the calculation of partition sizes.

Note from the Author or Editor:
I spoke with Artem Chebotko who created these formulas originally and updated them to account for the storage format changes that came in Apache Cassandra 3.0 with the new storage engine implementation.

Technically the description of the formula is correct, since the number of values is described as the number of cells, which is a specific reference to the storage format. Clustering column values are stored in a row header rather than as cells. This is a change from the pre-3.0 storage format, in which the clustering column values were stored as cells.

While the description is technically correct, I see how it can be confusing and potentially not very useful to omit the clustering column values from a calculation of the number of values. To include those values as well, you would multiply the number of rows by the number of clustering columns.

Nick Triller  Aug 24, 2017 
Printed
Page 137
1st sentence

The last part of the first sentence "The random partitioner ..." states this partitioner "... is Cassandra's default". This should be replaced by "was Cassandra's default in Cassandra 1.1 and earlier".

Note from the Author or Editor:
Agree with recommended change as described above.

Anonymous  Dec 28, 2016  Apr 07, 2017
Printed
Page 163
Second code sample

The second line of the coding sample contains an error. "MappingManager" is capitalized, which is incorrect, referencing the class "MappingManager" instead of the variable "mappingManager" declared in the previous line.

The line should read:

Mapper<Hotel> hotelMapper = mappingManager.mapper(Hotel.class);

Jeffrey Carpenter
Jeffrey Carpenter
 
Jun 10, 2017 
Printed
Page 186
second full paragraph (the one after the first code sample)

The text regarding lightweight transactions reads:

"This command checks to see if there is a record with the partition key, which for this table consists of the hotel_id."

This should be clarified, it is more than the partition key that must be unique, it is the entire primary key. The lightweight transaction is trying to make sure the row does not exist. For a multi-row partition this distinction is important.

The sentence should be changed to read:

"This command checks to see if the row already exists, that is, if there is a record with the same primary key, which for this table consists of the hotel_id."

Jeffrey Carpenter
Jeffrey Carpenter
 
Jun 16, 2017 
Printed
Page 292
middle

On page 292, there is a sentence midway down that reads "As with authentication, the authentication mechanism is pluggable". I believe it should read: "As with authentication, the authorization mechanism is pluggable".

Note from the Author or Editor:
The comment is correct. The sentence should read:

"As with authentication, the authorization mechanism is pluggable".

Steve Halladay  Apr 28, 2017 
PDF
Page 297
1st code block

It looks that we are attempting to create a trust store at node 1 and are adding to it a certificate generated by the node 1. Not sure, if that makes sense. Does node 1 need to establish secure connection with itself?

$ keytool -import -v -trustcacerts -alias node1 -file node1.cer
-keystore node1.truststore

I would change it to:
$ keytool -import -v -trustcacerts -alias node1 -file node1.cer
-keystore nodeX.truststore

where X, is the node, where we create a trust store for node 1 and import certificate produced by node 1 into newly created trust store.

Note from the Author or Editor:
This is a good clarification. Node 1 doesn't need to add its own public cert.

I would replace the sentence "Each command looks something like the following:" with the sentence "For example, to add the certificate for node 1 to the keystore for node two, we would use the command: "

And the command should be changed to:

$ keytool -import -v -trustcacerts -alias node1 -file node1.cer
-keystore node2.truststore


Ihor Mochurad  Sep 03, 2016  Apr 07, 2017
Mobi
Page 2282
3rd paragraph of section "Calculating Partition Size"

I'm reading MOBI, so there is no pages.

In the 3rd paragraph of section "Calculating Partition Size" there is duplicate "of" in the phrase "and the number of of values per row"

Note from the Author or Editor:
Chapter 5, Pg 97, 1st Paragraph, remove repeated "of" as described above.

Alex Ott  May 15, 2018 
Mobi
Page 3441
2nd paragraph of "Startup and JVM Settings"

The page number is approximate position in the MOBI file

scripts are called conf/cassandra-env.sh & conf/cassandra-env.ps1 instead of conf/cassandra.env.sh & conf/cassandra.env.ps1 as in book..

Note from the Author or Editor:
Chapter 7, Pg 144, second paragraph in "Startup and JVM Settings", second sentence should read:

"The key file to look at is the environment script conf/cassandra-env.sh (or conf/cassandra-env.ps1 PowerShell script on Windows)."

(please maintain italics on file names conf/cassandra-env.sh and conf/cassandra-env.ps1)

Alex Ott  May 15, 2018 
Mobi
Page 5018
The "More on JMX" section

In this section we first talk about SNMP, but then mention SMTP instead of it in the sentence "which may be useful if you are using SMTP monitoring tools such as Nagios or Zenoss"

Note from the Author or Editor:
Chapter 10, Page 212, paragraph following the figure, remove the reference to SMTP so that the sentence reads:

"The JVM also offers management capabilities via Simple Network Monitoring Protocol (SNMP), which may be useful if you are using monitoring tools such as Nagios or Zenoss."

Alex Ott  May 15, 2018 
Mobi
Page 7088
"Production environment" item in the "Selecting instances" section

in the sentence about machines for production environment, it uses MB instead of GB when talking about memory size: "and anywhere from 16 MB to 64 MB of memory"

Note from the Author or Editor:
Chapter 14, "Selecting Instances", Pg 305, paragraph "Production environments"

The sentence should read:

"Cassandra nodes in production environments should have CPUs with at least eight cores (although four cores are acceptable for virtual machines), and anywhere from 16GB to 64GB memory."

Alex Ott  May 15, 2018