Using MapReduce to do grouping and counting with Cassandra input and output
Many types of grid computing systems can divide a problem into smaller sub-problems and distribute this across many nodes. Hadoop's distributed computing model uses
MapReduce has a map phase, a shuffle sort that uses a
Partitioner to guarantee that identical keys go to the same reducer, and finally a reduce phase. This recipe shows a
word_count application in the Cassandra contrib. Grouping and counting is a problem ideal for
MapReduce to solve.
More information on
MapReduce can be found on http://en.wikipedia.org/wiki/MapReduce.
The complete code for this example is found here: http://svn.apache.org/repos/asf/cassandra/branches/cassandra-0.7/contrib/word_count/ ...