Using MapReduce to do grouping and counting with Cassandra input and output

Many types of grid computing systems can divide a problem into smaller sub-problems and distribute this across many nodes. Hadoop's distributed computing model uses MapReduce. MapReduce has a map phase, a shuffle sort that uses a Partitioner to guarantee that identical keys go to the same reducer, and finally a reduce phase. This recipe shows a word_count application in the Cassandra contrib. Grouping and counting is a problem ideal for MapReduce to solve.

Note

More information on MapReduce can be found on http://en.wikipedia.org/wiki/MapReduce.

Getting ready

The complete code for this example is found here: http://svn.apache.org/repos/asf/cassandra/branches/cassandra-0.7/contrib/word_count/ ...

Get Cassandra High Performance Cookbook now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.