Errata

Learning Apache Drill

Errata for Learning Apache Drill

Submit your own errata for this product.

The errata list is a list of errors and their corrections that were found after the product was released.

The following errata were submitted by our customers and have not yet been approved or disproved by the author or editor. They solely represent the opinion of the customer.

Color Key: Serious technical mistake Minor technical mistake Language or formatting error Typo Question Note Update

Version Location Description Submitted by Date submitted
Printed, PDF, ePub, Mobi, , Other Digital Version Page 133
Chapter intro

Chapter 8 describes how to work with Drill's "Schema on Read" features. The chapter discusses several forms of "schema ambiguity" that can arise. The text box on page 155 describes some of the issues. The text box on page 144 mentions that, at the time of writing, the Drill project was working on possible solutions.

Drill 1.16 introduces a "schema provisioning" mechanism to solve these problems. See the documentation for the CREATE or ALTER SCHEMA command for details: https://drill.apache.org/docs/create-or-replace-schema/

In Drill 1.16, the text (CSV) reader provides the ability to specify column types and default values. The Drill project continues to roll the feature for other file formats. If you encounter the schema issues described in Chapter 8, consult the Drill documentation to determine if a "schema provisioning" solution is available for your file format.

Paul Rogers  Jun 18, 2019 
Printed, PDF, ePub, Mobi, , Other Digital Version Page 138
Step 3, "Edit the JSON"...

Step 3 shows how to define a workspace dfs.data for the sample data files provided in the book's Github directory. The path omits one part of the path. The correct text is:

"location": "/Users/arina/drillbook/data",

That is, add "/data" to the location where you've cloned the book's Github project (where, in the example, "/Users/arina/drillbook" is an example download location.)

Paul Rogers  Jun 18, 2019 
Printed, PDF, ePub, Mobi, , Other Digital Version Page 138
Bottom of page

Multiple places in Chapter 8 we refer to the "cust.csv" file in the book's Github repo. As it turns out, the repo was reorganized after the chapter was written. The "cust.csv" file is actually in the "cust" directory within the repo. An example such as the following on page 138:

SELECT * FROM `local`.`data`.`cust.csv`

Should be:

SELECT * FROM `local`.`data`.`cust/cust.csv`

This same issue occurs on page 139, when using a default schema, on page 141 under Format Inference and so on.

Paul Rogers  Jun 18, 2019 
Printed, PDF, ePub, Mobi, , Other Digital Version Page 139
Second example under "Default Schema"

The example here assumes you've created the "local" storage config as explained on page 137, "Storage Configurations". The `local` config lets you access files on your local machine. In a normal production Drill, `dfs` is configured to work with your Hadoop, S3 or other distributed storage system.

If you did not follow along and create that config, and your `dfs` config still points to your local file system (as it does when you first install Drill), then you can replace `local` with `dfs.root`, which does the same thing.

Also, the path shown does not reflect the final structure of the book's Github repo. The correct examples should be:

USE `local`;
SELECT * FROM `/Users/home/arina/drillbook/data/cust/cust.csv`;

Where "/Users/home/arina/drillbook" is wherever you cloned the book's Github repo.

Alternatively, if you did not create the `local` config:

USE `dfs`.`root`;
SELECT * FROM `/Users/home/arina/drillbook/data/cust/cust.csv`;

Paul Rogers  Jun 18, 2019 
Printed, PDF, ePub, Mobi, , Other Digital Version Page 162
JSON Objects section

The book was written for Drill 1.13. Since that time, the Drill project has made a number of important improvements. One of these is support for a "lateral join" a way to query nested data as though it is a separate SQL table.

The "JSON Objects" section describes how to work with JSON objects, and the "JSON Lists in Drill" section, page 164, describes ow to work with lists, including lists of objects. At the time of Drill 1.13, support for lists of objects was somewhat limited.

However, with the addition of the lateral join feature in Drill 1.15, JSON (and Parquet) nested lists become much easier to use. Please see the Drill documentation for details: https://drill.apache.org/docs/lateral-join/

Paul Rogers  Jun 18, 2019 
Printed, PDF, ePub, Mobi, , Other Digital Version Page 221
Chapter intro

Chapter 12 describes how to write a storage plugin. Since the chapter was written, the Drill project has made significant improvements to the internal mechanisms used to create a plugin. Most of Chapter 12 is still valid. However, you should be aware of the new mechanisms by consulting the link below.

In particular, the details of "Creating the Format Plug-in Class", page 230, should change to use a new config class. Also, "The Record Reader", page 236 describes the old record reader format, the newer version is considerably simpler and more powerful.

A tutorial, that assumes you've read Chapter 12 here, and that describes the newly added mechanisms, written by the book co-author, is available here: https://github.com/paul-rogers/drill/wiki/Developer%27s-Guide-to-the-Enhanced-Vector-Framework

Paul Rogers  Jun 18, 2019