July 2015
Intermediate to advanced
400 pages
13h 19m
English
One advantage of Accumulo’s integration with Hadoop is that MapReduce jobs can be made to read input from Accumulo tables and also to write results to Accumulo tables. This can be done for ingesting a large amount of data quickly, for analyzing data in Accumulo tables, or for outputting data from Accumulo tables to HDFS.
Accumulo provides MapReduce input and output formats that read from Accumulo and write to Accumulo directly.
There are input and output formats for both MapReduce APIs: org.apache.hadoop.mapred and org.apache.hadoop.mapreduce.
A MapReduce job can read input from an Accumulo table, write output to an Accumulo table, or both.
To configure a MapReduce job to read input from an Accumulo table, use code similar to the following:
job.setInputFormatClass(AccumuloInputFormat.class);AccumuloInputFormat.setInputTableName(job,"table_name");ClientConfigurationzkiConfig=newClientConfiguration().withInstance("myInstance").withZkHosts("zoo1:2181,zoo2:2181");AccumuloInputFormat.setZooKeeperInstance(job,zkiConfig);AccumuloInputFormat.setConnectorInfo(job,"username",newPasswordToken("password"));List<Pair<Text,Text>>columns=newArrayList<>();columns.add(newPair(newText("colFam"),newText("colQual")));AccumuloInputFormat.fetchColumns(job,columns);// optionalList<Ranges>ranges=newArrayList<Range>();ranges.add(newRange("a","k"));AccumuloInputFormat.setRanges(job,ranges);// optionalAccumuloInputFormat.setScanIsolation ...