Hadoop: The Definitive Guide

Errata for Hadoop: The Definitive Guide

Submit your own errata for this product.

The errata list is a list of errors and their corrections that were found after the product was released.

The following errata were submitted by our customers and have not yet been approved or disproved by the author or editor. They solely represent the opinion of the customer.

Color Key: Serious technical mistake Minor technical mistake Language or formatting error Typo Question Note Update

Version Location Description Submitted by Date submitted
ePub Page Online github distcp command
Online github distcp command

I could find no other place to post this. I could not find an email for Tom White on the github repo.

I am unable to copy the weather data files to an Amazon EC2 cluster node. I am in the N. Virginia zone.

All attempts to copy or even list the files fail with ResponseMessage=Forbidden

Is this a permissions thing?


[ec2-user@ip-10-40-75-9 ~]$ hadoop distcp -Dfs.s3n.awsAccessKeyId=<AccessKey> -Dfs.s3n.awsSecretAccessKey=<SecretKey> s3n://hadoopbook/ncdc/all input/ncdc/all
13/11/28 23:33:51 INFO tools.DistCp: srcPaths=[s3n://hadoopbook/ncdc/all]
13/11/28 23:33:51 INFO tools.DistCp: destPath=input/ncdc/all
With failures, global counters are inaccurate; consider running with -i
Copy failed: org.apache.hadoop.fs.s3.S3Exception: org.jets3t.service.S3ServiceException: S3 HEAD request failed for '/ncdc%2Fall_%24folder%24' - ResponseCode=403, ResponseMessage=Forbidden
at org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.retrieveMetadata(
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

Lance Smith  Nov 28, 2013 
Printed Page 282
3rd line

3rd line, page 282, there is an example of Hadoop Streaming for Secondary Sort.

The example uses an option, -D mapred.text.key.comparator.options="-k1n -k2nr", to sort records based on the 1st and the 2nd columns in a composite key.

However, it does not work since mapred.text.key.comparator.options uses Linux/Unix sort like option format, -k[start_field],[end_field].
It means that if you omit the end position, it uses the sub-string from the [start_field] to the end of a composite key.
You can see how it extract columns from a composite key in KeyFieldHelper.getStartOffset and getEndOffset methods.

In addition, similar secondary sort examples in the Apache Hadoop homepage's documentation also use -k options like -k1,1n, -k2,2nr, etc.

Han-Cheol Cho  Jun 18, 2015