Using MapReduce to do grouping and counting with Cassandra input and output

Many types of grid computing systems can divide a problem into smaller sub-problems and distribute this across many nodes. Hadoop's distributed computing model uses MapReduce. MapReduce has a map phase, a shuffle sort that uses a Partitioner to guarantee that identical keys go to the same reducer, and finally a reduce phase. This recipe shows a word_count application in the Cassandra contrib. Grouping and counting is a problem ideal for MapReduce to solve.


More information on MapReduce can be found on

Getting ready

The complete code for this example is found here: ...

Get Cassandra High Performance Cookbook now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.