Using MapReduce to do grouping and counting with Cassandra input and output
Many types of grid computing systems can divide a problem into smaller sub-problems and distribute this across many nodes. Hadoop's distributed computing model uses MapReduce
. MapReduce
has a map phase, a shuffle sort that uses a Partitioner
to guarantee that identical keys go to the same reducer, and finally a reduce phase. This recipe shows a word_count
application in the Cassandra contrib. Grouping and counting is a problem ideal for MapReduce
to solve.
Note
More information on MapReduce
can be found on http://en.wikipedia.org/wiki/MapReduce.
Getting ready
The complete code for this example is found here: http://svn.apache.org/repos/asf/cassandra/branches/cassandra-0.7/contrib/word_count/ ...
Get Cassandra High Performance Cookbook now with O’Reilly online learning.
O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.