Errata

Programming Hive

Errata for Programming Hive

Submit your own errata for this product.

The errata list is a list of errors and their corrections that were found after the product was released.

The following errata were submitted by our customers and have not yet been approved or disproved by the author or editor. They solely represent the opinion of the customer.

Color Key: Serious technical mistake Minor technical mistake Language or formatting error Typo Question Note Update

Version Location Description Submitted by Date submitted
Printed

The "code" can be downloaded from: https://github.com/oreillymedia/programming_hive

I've downloaded this .zip file and DO NOT see any code. All I see is txt and csv files of data and no code?

For example, where is the sample example code for Chapter 13?

If there is no code then maybe the Errata link should be renamed to "download data"

graham seed  Nov 06, 2017 
Printed Page 36
2nd paragraph

Original Content:

$echo "one row" /tmp/myfile

$hive -e "LOAD DATA LOCAL IN PATH '/tmp/file' INTO TABLE src;

If you notice the Hive Load is not closed here, by Double Quotes at the end of the hive query.

It should be

$hive -e "LOAD DATA LOCAL IN PATH '/tmp/file' INTO TABLE src;"



Bharath Kumar Gajjela  Mar 20, 2016 
PDF Page 57
Note Section

The Note Section says, if you Omit the External Keyword and the Original table is External, the new table will also be External.

I tried this and it created a Managed table. I used the following sequence :

1. Created an External Table
2. Select * from external_table fetches me around 100 rows
3. Created a Table (omitting external keyword)
4. describe formatted new_table shows table_type as MANAGED
5. select * from new_table fetches around 100 rows
6. dropped the new_table
7. select * from external_table fetches ZERO rows as the data is also deleted post step 6.

Regards
Vivek

Vivek Sharma  Jan 31, 2017 
PDF Page 65
last code example

EXCHANGE started to be one of reserved keywords in Hive according to https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL from version 1.2.2.

So all the examples about the stocks table throughout the book will not work. You should suggest readers use "`exchange`" or change the column name.

ZHANG Xiaodong  Mar 29, 2017 
Printed, PDF Page 68
The description in Adding Columns section

'If any of the new columns are in the wrong position, use ALTER COLUMN table CHANGE COLUMN statement for each one to move it to the correct position'

The statement to be corrected as

...use ALTER TABLE table CHANGE COLUMN..

Dhanashree N P  Apr 16, 2018 
Printed Page 76
5th paragraph

In 5th paragraph , below statement explains about usage of URI path for Exporting data.

The specified path can also be a full URI(eg.., hdfs://master-server/tmp/ca_employees)


I believe a sample query could help here.

INSERT OVERWRITE DIRECTORY '/tmp/ca_employees'
select name,salary,address from employees
where se.state = 'CA';


Bharath Kumar Gajjela  Oct 02, 2016 
Printed, PDF Page 76
6th paragraph

'As a reminder, we can look at the results from within hive CLI:
...
hive> ! cat /tmp/payroll/000000_0
John Doe100000.0201 San Antonio CircleMountain ViewCA94040
Mary Smith80000.01 Infinity LoopCupertinoCA95014
....
'

It should have been:

hive> ! cat /tmp/ca_employees/000000_0

Dhanashree N P  Apr 16, 2018 
Printed, PDF Page 77
1st Paragraph

The field delimiter for the table can be problematic. For example, if it uses the default ^A delimiter. If you export table data frequently, it might be appropriate to use comma or tab delimiters

The second sentence -- if it uses the default ^A delimiter -- the sentence seems incomplete.

Also, would it be possible to clarify this with an example?

Dhanashree N P  Apr 16, 2018 
PDF Page 91
Nested SELECT Statements

NESTED SELECT STATEMENTS
hive> FROM (
> SELECT upper(name), salary, deductions["Federal Taxes"] as fed_taxes,
> round(salary * (1 - deductions["Federal Taxes"])) as salary_minus_fed_taxes
> FROM employees
> ) e
> SELECT e.name, e.salary_minus_fed_taxes
> WHERE e.salary_minus_fed_taxes > 70000;

This job is failing because there is no alias for name in the inner query and we are accessing column in the outer query using the original name. I have tried using the below query and it works as I have used alias

FROM (
SELECT upper(name) AS UPPER_NAME, salary, deductions["Federal Taxes"] as fed_taxes,
round(salary * (1 - deductions["Federal Taxes"])) as salary_minus_fed_taxes
FROM default.employees1
)e
SELECT e.UPPER_NAME, e.salary_minus_fed_taxes
WHERE e.salary_minus_fed_taxes > 70000;





Bharath  Jul 06, 2015 
PDF Page 96
«LIKE and RLIKE», 2nd paragraph

Minor typo: the description says «... the street contains Chicago», but the code line is «WHERE address.street LIKE '%Chi%';».

It looks like either code line should be updated to «WHERE address.street LIKE '%Chicago%'», either description should be changed to «... the street contains Chi»

Alena Hardynets  Jan 13, 2017 
PDF Page 112
UNION ALL, 2nd paragraph

«Here is an example the merges log data:» sounds bizzare, I guess it should have been «an example THAT merges log data»

Alena Hardynets  Jan 15, 2017 
PDF Page 118
CREATE INDEX

This error has already reported, but suggested solutions do not work.
In both cases:
So, the following should be the code at the top of page 118:

CREATE INDEX employees_index
ON TABLE employees (country, name)
AS 'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler' WITH DEFERRED REBUILD
IDXPROPERTIES ('creator = 'me', 'created_at' = 'some_time')
IN TABLE employees_index_table
PARTITIONED BY (country)
COMMENT 'Employees indexed by country and name.';

The second example on 118 should be:

CREATE INDEX employees_index
ON TABLE employees (country, name)
AS 'BITMAP'
WITH DEFERRED REBUILD
IDXPROPERTIES ('creator = 'me', 'created_at' = 'some_time') IN TABLE employees_index_table
PARTITIONED BY (country)
COMMENT 'Employees indexed by country and name.';

The followong exception is thrown in both cases:
FAILED: ParseException line 6:0 missing EOF at 'PARTITIONED' near 'employeeex_table'

Anonymous  Nov 15, 2016 
ePub Page 119
Bullet 2

The s3n directory needs to be consistent.

While doing a copy it is "s3n://ourbucket/logs/2011/12/02" but while adding the partition the directory location is 's3n://ourbucket/logs/2011/01/02'

See excerpts below

Copy the data for the partition being moved to S3. For example, you can use the hadoop distcp command:

hadoop distcp /data/log_messages/2011/12/02 s3n://ourbucket/logs/2011/12/02

Alter the table to point the partition to the S3 location:
ALTER TABLE log_messages PARTITION(year = 2011, month = 12, day = 2)
SET LOCATION 's3n://ourbucket/logs/2011/01/02';

Adebiyi Abdurrahman  Aug 09, 2015 
ePub Page 119
Bullet 2

The s3n directory needs to be consistent.

While doing a copy it is "s3n://ourbucket/logs/2011/12/02" but while adding the partition the directory location is 's3n://ourbucket/logs/2011/01/02'

See excerpts below

Copy the data for the partition being moved to S3. For example, you can use the hadoop distcp command:

hadoop distcp /data/log_messages/2011/12/02 s3n://ourbucket/logs/2011/12/02

Alter the table to point the partition to the S3 location:
ALTER TABLE log_messages PARTITION(year = 2011, month = 12, day = 2)
SET LOCATION 's3n://ourbucket/logs/2011/01/02';

Adebiyi Abdurrahman  Aug 09, 2015 
Printed Page 171
code block

append("returns"); should be append("returns ");

Anonymous  Nov 06, 2017 
PDF Page 283
3rd paragraph, in the source code

It was reported by Zheyi Rong, but the condition provided is still wrong.

SUM(IF(b.b_timestamp + 1800 >= a.a_timestamp AND
b.b_timestamp < a.a_timestamp,1,0)) AS c_nonorigin_flags

should be

SUM(IF(a.a_timestamp >= b.b_timestamp + 1800 AND
a.a_timestamp > b.b_timestamp,1,0)) AS c_nonorigin_flags

Zheyi's version of the condition would be always false.

The conditions are both in page 283 and 284.

Ruidian(Dean) Ye   Feb 21, 2018