Errata

Presto: The Definitive Guide

Errata for Presto: The Definitive Guide

Submit your own errata for this product.

The errata list is a list of errors and their corrections that were found after the product was released. If the error was corrected in a later version or reprint the date of the correction will be displayed in the column titled "Date Corrected".

The following errata were submitted by our customers and approved as valid errors by the author or editor.

Color key: Serious technical mistake Minor technical mistake Language or formatting error Typo Question Note Update

Version Location Description Submitted By Date submitted Date corrected
PDF, ePub
Page 23
Adding a Data Source 4th paragraph

The paragraph says "let’s say you create catalog properties files etc/cdh-hadoop.properties, etc/sales.properties, etc/web- traffic.properties, and etc/mysql-dev.properties. "

But per previous description, catalog property files should be under etc/catalog/ folder. Thus, those file names should be etc/catalog/cdh-hadoop.properties, etc/catalog/sales.properties, etc/catalog/web- traffic.properties, and etc/catalog/mysql-dev.properties.

Note from the Author or Editor:
Fixed

Chen Zhang  May 06, 2020 
Printed, PDF
Page 26
1st paragraph code area

Format print mistake here:
"
...
$ mv presto ~/bin
...
"
Actually "~" is the hex code of “~”.

Note from the Author or Editor:
Thank you for finding this issue. I have fixed the problem in the source and the next revision of the book will include the updated snippet.

Steven Zheng  Apr 22, 2020 
PDF
Page 32
Downloading and Registering the Driver, the first paragraph

it says "The server is available as a JAR file."
but I think it should be "the *driver* is available as a JAR file"

Note from the Author or Editor:
Fixed

Chen Zhang  May 07, 2020 
, Printed, PDF, ePub
Page 40
Figure 4-3

In the Figure 4-3, it can be seen clearly that one client sends a select query towards coordinator and the later one send back the resultset after the query has been processed. The direction of the arrow is wrong which should be reversed.

Note from the Author or Editor:
Thank you for submitting the errata. You are correct the arrows need to be reversed. The top one under the query should point to the coordinator on the right. The bottom on above the results should point to the CLI on the left.

Huang Pengcheng  Apr 10, 2020 
PDF
Page 46
Figure 4-3

Direction of the arrows going to and from the CLI to the coordinator are pointing in the wrong directions. The SQL statement should be pointing to the coordinator and the return values should be pointing back to the CLI.

Note from the Author or Editor:
The feedback is correct. I requested an image update for the new revision.

Brian Olsen  May 11, 2020 
Printed, PDF, ePub
Page 58
3rd paragraph, code snippet

At the forth line of the snippet,
"// then inner join cutkey" should be "// then inner join custkey"

Note from the Author or Editor:
Already fixed

Steven Zheng  May 01, 2020 
PDF
Page 58
example 4-4

In example 4-4, there is a comment "// then inner join cutkey"

But the "cutkey" is a typo, it should be "custkey"

Note from the Author or Editor:
Fixed

Chen Zhang  May 22, 2020 
PDF
Page 110
last paragraph

It says "The key is sorted first by the key and the column in ascending lexicographic order"

I think it makes no sense that a "key" is sorted first by "the key". I guess the sentence should be "the key is sorted first by the row ID and the column ..."

Note from the Author or Editor:
Fixed

Chen Zhang  May 14, 2020 
, Printed, PDF, ePub
Page 138
Chapter 6 Connector 5th paragraph

Another advanced feature of a connector is to provide table statistics, *which* (instead of "that") can be used ... to query plans. (There were two dot in the original version of epub)

Note from the Author or Editor:
I fixed this in the source code now and pushed the change to master.

Huang Pengcheng  Apr 11, 2020 
, Printed, PDF, ePub
Page 210
Chapter 9 Logical Operators

As an anwser of "What are the best days of the week to fly out of Boston in the month of February?", the original sql used the dayofweek column as a sorted column. However, more precisely, the delay column should be used as it indicates the least delays.

Note from the Author or Editor:
Since there are only 7 days in the results the reader can easily see the best days from the 1-7 dayofweek column. Sorting it based on the delay is also possible and should probably combined with changing from number to the name for dayofweek. However I do not think this is necessary in the context where the query is used in the book.

Huang Pengcheng  Apr 13, 2020 
ePub
Page 210
Chapter 6 6.4 Hive Connector for Distributed Storage Data Sources

"hdfs:/user/hive/warehouse/web/page_views/..." -> hdfs://

Double slash should be here.

Note from the Author or Editor:
Fixed

Huang Pengcheng  May 10, 2020 
PDF
Page 216
Figure 10.7

The figures 10.7, 10.8 and 10.9 use the wrong label for the TLS certificate.

It should be changed from

TLS Certification

to

TLS Certificate

for all occurrences in the three images.

Manfred Moser
 
Apr 15, 2020 
, Printed, PDF, ePub
Page 280
chapter 12 Memory Management

In the example for a small cluster, there are 10 workers while the later description use 8 workers to calculate the memory per node a query can use. I am wondering why not use 10 for this.

Note from the Author or Editor:
As mentioned in the paragraph the calculation should use the number of workers or less. In the example we have 10 nodes, minor 1 coordinator node, leaves 9 nodes. One less to be on the safe side and we have 8 to use for our calculation. This makes the allocation a bit more generous and since this only a rule-of-thumb calculation this is sufficient.

Huang Pengcheng  Apr 14, 2020 
ePub
Page 280
Chapter 6 -File Formats and Compression-

You can change the *code* to use SNAPPY or NONE by setting the hive.compression-codec configuration in the catalog properties file.

code->codec

Note from the Author or Editor:
Fixed

Huang Pengcheng  May 01, 2020 
ePub
Page 350
Chapter 7 Apache Cassandra Connector 3rd paragraph

By using the Cassandra connector, however, you can allow SQL querying of your data in Cassandra. Minimal configuration is a simple catalog file like etc/catalog/sitedata for a Cassandra cluster tracking all user interaction on a website, for example

etc/catalog/sitedata-> etc/catalog/sitedata.properties

Note from the Author or Editor:
Fixed

Huang Pengcheng  May 02, 2020 
ePub
Page 380
Chapter 7 Streaming System Connector Example: Kafka

The original text says: "Kafka messages within a topic use different formats."

However, from my point of view, the messages in a single Kafka topic usually use the same format so that the upcoming consuming will be handy and efficient. Furthermore, there is impossible for presto to handle the different format s of messages within a topic as the property dataFormat is used for message, which is a string.

Maybe "Kafka messages within different topics use different formats." will be better.

Note from the Author or Editor:
Reworded

Huang Pengcheng  May 02, 2020 
ePub
Page 410
Chapter 7 Document Store Connector Example: Elasticsearch

"Also note that this individual connection to specific shards also happens in typical Elasticsearch clusters where the cluster runs behind a load balancer and is just exposed via a DNS hostname."

I'm wondering if this kind of es cluster could be queried by presto es connector since the only exposed endpoint is a dns hostname and the external presto cluster couldn't reach the single node of the es cluster.

Note from the Author or Editor:
Unless a firewall blocks the individual node names it does work that way. So leaving docs as is.

Huang Pengcheng  May 02, 2020 
ePub
Page 420
Chapter8 Tables

The insert SQL missed a closing parenthesis.

INSERT INTO iris (
sepal_length_cm,
sepal_width_cm,
petal_length_cm,
petal_width_cm,
species )
VALUES
( ... ***)****

Note from the Author or Editor:
.. implied that more is coming but sure .. I added the )

Huang Pengcheng  May 02, 2020 
ePub
Page 420
Chapter8 Tables

The insert SQL missed a closing parenthesis.

INSERT INTO iris (
sepal_length_cm,
sepal_width_cm,
petal_length_cm,
petal_width_cm,
species )
VALUES
( ... ***)****

Note from the Author or Editor:
Fixed

Huang Pengcheng  May 02, 2020 
ePub
Page 440
Chapter 9 String Functions and Operators

In the Table 9-7, concat function: "Equivalent to the operator."

There is a missing operator ||.

Note from the Author or Editor:
Fixed

Huang Pengcheng  May 06, 2020 
ePub
Page 450
Chapter 9 GROUP BY and HAVING Clauses

The SQL below "Here is the full query:" is quite similar with the next SQL except the specific number. I think it's not a good example to demonstrate HAVING clause or is it a errata?

Thanks!

Note from the Author or Editor:
This is correct as is

Huang Pengcheng  May 03, 2020 
ePub
Page 480
Chapter 9 Regular Expressions


There should be an example like "regexp_like(abc, d*)" for "* Zero or more" in table 9-11.

Note from the Author or Editor:
Fixed

Huang Pengcheng  May 07, 2020 
ePub
Page 520
Chapter 9 Prepared Statements

There is a duplicated DEALLOCATE in the last section.

DEALLOCATE PREPARE delay_query;
*DEALLOCATE*

Note from the Author or Editor:
Fixed

Huang Pengcheng  May 07, 2020