Aggregating sources in Accumulo using MapReduce

In this recipe, we will use MapReduce and the AccumuloInputFormat class to count occurrences of each unique source stored in an Accumulo table.

Getting ready

This recipe will be the easiest to test over a pseudo-distributed Hadoop cluster with Accumulo 1.4.1 and Zookeeper 3.3.3 installed. The shell script in this recipe assumes that Zookeeper is running on the host localhost and on the port 2181; you can change this to suit your environment needs. The Accumulo installation's bin folder needs to be on your environment path.

For this recipe you'll need to create an Accumulo instance named test with user as root and password as password.

You will need a table by the name acled to exist in the configured ...

Get Hadoop Real-World Solutions Cookbook now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.