Understanding stream groupings

Based on the previous example, you may wonder why we did not bother increasing the parallelism of ReportBolt. The answer is that it does not make any sense to do so. To understand why, you need to understand the concept of stream groupings in Storm.

A stream grouping defines how a stream's tuples are distributed among bolt tasks in a topology. For example, in the parallelized version of the word count topology, the SplitSentenceBolt class was assigned four tasks in the topology. The stream grouping determines which one of those tasks will receive a given tuple.

Storm defines seven built-in stream groupings:

  • Shuffle grouping: This randomly distributes tuples across the target bolt's tasks such that each bolt receives ...

Get Storm Blueprints: Patterns for Distributed Real-time Computation now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.