In this chapter, you’ll see how to pass tuples between the different components of a Storm topology, and how to deploy a topology into a running Storm cluster.
One of the most important things that you need to do when designing
a topology is to define how data is exchanged between components (how
streams are consumed by the bolts). A Stream Grouping
specifies which stream(s) are consumed by each
bolt and how the stream will be consumed.
A node can emit more than one stream of data. A stream grouping allows us to choose which stream to receive.
The stream grouping is set when the topology is defined, as we saw in Chapter 2:
In the preceding code block, a bolt is set on the topology builder, and then a source is set using the shuffle stream grouping. A stream grouping normally takes the source component ID as a parameter, and optionally other parameters as well, depending on the kind of stream grouping.
There can be more than one source per
InputDeclarer, and each source can be grouped
with a different stream grouping.
Shuffle Grouping is the most commonly used grouping. It takes a single parameter (the source component) and sends each tuple emitted by the source to a randomly chosen bolt warranting that each consumer will receive the same number of tuples.
The shuffle grouping is useful for doing atomic operations such as a math operation. ...