Errata

Errata for Programming Hive

Submit your own errata for this product.

The errata list is a list of errors and their corrections that were found after the product was released.

The following errata were submitted by our customers and have not yet been approved or disproved by the author or editor. They solely represent the opinion of the customer.

Color Key: Serious technical mistake Minor technical mistake Language or formatting error Typo Question Note Update

Version	Location	Description	Submitted by	Date submitted
Printed		The "code" can be downloaded from: https://github.com/oreillymedia/programming_hive I've downloaded this .zip file and DO NOT see any code. All I see is txt and csv files of data and no code? For example, where is the sample example code for Chapter 13? If there is no code then maybe the Errata link should be renamed to "download data"	graham seed	Nov 06, 2017
Printed	Page 36 2nd paragraph	Original Content: $echo "one row" /tmp/myfile $hive -e "LOAD DATA LOCAL IN PATH '/tmp/file' INTO TABLE src; If you notice the Hive Load is not closed here, by Double Quotes at the end of the hive query. It should be $hive -e "LOAD DATA LOCAL IN PATH '/tmp/file' INTO TABLE src;"	Bharath Kumar Gajjela	Mar 20, 2016
PDF	Page 57 Note Section	The Note Section says, if you Omit the External Keyword and the Original table is External, the new table will also be External. I tried this and it created a Managed table. I used the following sequence : 1. Created an External Table 2. Select * from external_table fetches me around 100 rows 3. Created a Table (omitting external keyword) 4. describe formatted new_table shows table_type as MANAGED 5. select * from new_table fetches around 100 rows 6. dropped the new_table 7. select * from external_table fetches ZERO rows as the data is also deleted post step 6. Regards Vivek	Vivek Sharma	Jan 31, 2017
PDF	Page 65 last code example	EXCHANGE started to be one of reserved keywords in Hive according to https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL from version 1.2.2. So all the examples about the stocks table throughout the book will not work. You should suggest readers use "`exchange`" or change the column name.	ZHANG Xiaodong	Mar 29, 2017
Printed, PDF	Page 68 The description in Adding Columns section	'If any of the new columns are in the wrong position, use ALTER COLUMN table CHANGE COLUMN statement for each one to move it to the correct position' The statement to be corrected as ...use ALTER TABLE table CHANGE COLUMN..	Dhanashree N P	Apr 16, 2018
Printed	Page 76 5th paragraph	In 5th paragraph , below statement explains about usage of URI path for Exporting data. The specified path can also be a full URI(eg.., hdfs://master-server/tmp/ca_employees) I believe a sample query could help here. INSERT OVERWRITE DIRECTORY '/tmp/ca_employees' select name,salary,address from employees where se.state = 'CA';	Bharath Kumar Gajjela	Oct 02, 2016
Printed, PDF	Page 76 6th paragraph	'As a reminder, we can look at the results from within hive CLI: ... hive> ! cat /tmp/payroll/000000_0 John Doe100000.0201 San Antonio CircleMountain ViewCA94040 Mary Smith80000.01 Infinity LoopCupertinoCA95014 .... ' It should have been: hive> ! cat /tmp/ca_employees/000000_0	Dhanashree N P	Apr 16, 2018
Printed, PDF	Page 77 1st Paragraph	The field delimiter for the table can be problematic. For example, if it uses the default ^A delimiter. If you export table data frequently, it might be appropriate to use comma or tab delimiters The second sentence -- if it uses the default ^A delimiter -- the sentence seems incomplete. Also, would it be possible to clarify this with an example?	Dhanashree N P	Apr 16, 2018
PDF	Page 91 Nested SELECT Statements	NESTED SELECT STATEMENTS hive> FROM ( > SELECT upper(name), salary, deductions["Federal Taxes"] as fed_taxes, > round(salary * (1 - deductions["Federal Taxes"])) as salary_minus_fed_taxes > FROM employees > ) e > SELECT e.name, e.salary_minus_fed_taxes > WHERE e.salary_minus_fed_taxes > 70000; This job is failing because there is no alias for name in the inner query and we are accessing column in the outer query using the original name. I have tried using the below query and it works as I have used alias FROM ( SELECT upper(name) AS UPPER_NAME, salary, deductions["Federal Taxes"] as fed_taxes, round(salary * (1 - deductions["Federal Taxes"])) as salary_minus_fed_taxes FROM default.employees1 )e SELECT e.UPPER_NAME, e.salary_minus_fed_taxes WHERE e.salary_minus_fed_taxes > 70000;	Bharath	Jul 06, 2015
PDF	Page 96 «LIKE and RLIKE», 2nd paragraph	Minor typo: the description says «... the street contains Chicago», but the code line is «WHERE address.street LIKE '%Chi%';». It looks like either code line should be updated to «WHERE address.street LIKE '%Chicago%'», either description should be changed to «... the street contains Chi»	Alena Hardynets	Jan 13, 2017
PDF	Page 112 UNION ALL, 2nd paragraph	«Here is an example the merges log data:» sounds bizzare, I guess it should have been «an example THAT merges log data»	Alena Hardynets	Jan 15, 2017
PDF	Page 118 CREATE INDEX	This error has already reported, but suggested solutions do not work. In both cases: So, the following should be the code at the top of page 118: CREATE INDEX employees_index ON TABLE employees (country, name) AS 'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler' WITH DEFERRED REBUILD IDXPROPERTIES ('creator = 'me', 'created_at' = 'some_time') IN TABLE employees_index_table PARTITIONED BY (country) COMMENT 'Employees indexed by country and name.'; The second example on 118 should be: CREATE INDEX employees_index ON TABLE employees (country, name) AS 'BITMAP' WITH DEFERRED REBUILD IDXPROPERTIES ('creator = 'me', 'created_at' = 'some_time') IN TABLE employees_index_table PARTITIONED BY (country) COMMENT 'Employees indexed by country and name.'; The followong exception is thrown in both cases: FAILED: ParseException line 6:0 missing EOF at 'PARTITIONED' near 'employeeex_table'	Anonymous	Nov 15, 2016
ePub	Page 119 Bullet 2	The s3n directory needs to be consistent. While doing a copy it is "s3n://ourbucket/logs/2011/12/02" but while adding the partition the directory location is 's3n://ourbucket/logs/2011/01/02' See excerpts below Copy the data for the partition being moved to S3. For example, you can use the hadoop distcp command: hadoop distcp /data/log_messages/2011/12/02 s3n://ourbucket/logs/2011/12/02 Alter the table to point the partition to the S3 location: ALTER TABLE log_messages PARTITION(year = 2011, month = 12, day = 2) SET LOCATION 's3n://ourbucket/logs/2011/01/02';	Adebiyi Abdurrahman	Aug 09, 2015
ePub	Page 119 Bullet 2	The s3n directory needs to be consistent. While doing a copy it is "s3n://ourbucket/logs/2011/12/02" but while adding the partition the directory location is 's3n://ourbucket/logs/2011/01/02' See excerpts below Copy the data for the partition being moved to S3. For example, you can use the hadoop distcp command: hadoop distcp /data/log_messages/2011/12/02 s3n://ourbucket/logs/2011/12/02 Alter the table to point the partition to the S3 location: ALTER TABLE log_messages PARTITION(year = 2011, month = 12, day = 2) SET LOCATION 's3n://ourbucket/logs/2011/01/02';	Adebiyi Abdurrahman	Aug 09, 2015
Printed	Page 171 code block	append("returns"); should be append("returns ");	Anonymous	Nov 06, 2017
PDF	Page 283 3rd paragraph, in the source code	It was reported by Zheyi Rong, but the condition provided is still wrong. SUM(IF(b.b_timestamp + 1800 >= a.a_timestamp AND b.b_timestamp < a.a_timestamp,1,0)) AS c_nonorigin_flags should be SUM(IF(a.a_timestamp >= b.b_timestamp + 1800 AND a.a_timestamp > b.b_timestamp,1,0)) AS c_nonorigin_flags Zheyi's version of the condition would be always false. The conditions are both in page 283 and 284.	Ruidian(Dean) Ye	Feb 21, 2018