Chapter 7. MapReduce API

One advantage of Accumulo’s integration with Hadoop is that MapReduce jobs can be made to read input from Accumulo tables and also to write results to Accumulo tables. This can be done for ingesting a large amount of data quickly, for analyzing data in Accumulo tables, or for outputting data from Accumulo tables to HDFS.

Formats

Accumulo provides MapReduce input and output formats that read from Accumulo and write to Accumulo directly. There are input and output formats for both MapReduce APIs: org.apache.hadoop.mapred and org.apache.hadoop.mapreduce.

A MapReduce job can read input from an Accumulo table, write output to an Accumulo table, or both.

To configure a MapReduce job to read input from an Accumulo table, use code similar to the following:

job.setInputFormatClass(AccumuloInputFormat.class);

AccumuloInputFormat.setInputTableName(job, "table_name");

ClientConfiguration zkiConfig = new ClientConfiguration()
            .withInstance("myInstance")
            .withZkHosts("zoo1:2181,zoo2:2181");

AccumuloInputFormat.setZooKeeperInstance(job, zkiConfig);
AccumuloInputFormat.setConnectorInfo(job, "username",
    new PasswordToken("password"));

List<Pair<Text,Text>> columns = new ArrayList<>();
columns.add(new Pair(new Text("colFam"), new Text("colQual")));
AccumuloInputFormat.fetchColumns(job, columns); // optional

List<Ranges> ranges = new ArrayList<Range>();
ranges.add(new Range("a", "k"));
AccumuloInputFormat.setRanges(job, ranges); // optional

AccumuloInputFormat.setScanIsolation ...

Get Accumulo now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.