Chapter 3. Topologies

In this chapter, you’ll see how to pass tuples between the different components of a Storm topology, and how to deploy a topology into a running Storm cluster.

Stream Grouping

One of the most important things that you need to do when designing a topology is to define how data is exchanged between components (how streams are consumed by the bolts). A Stream Grouping specifies which stream(s) are consumed by each bolt and how the stream will be consumed.

Tip

A node can emit more than one stream of data. A stream grouping allows us to choose which stream to receive.

The stream grouping is set when the topology is defined, as we saw in Chapter 2:

...
    builder.setBolt("word-normalizer", new WordNormalizer())
        .shuffleGrouping("word-reader");
...

In the preceding code block, a bolt is set on the topology builder, and then a source is set using the shuffle stream grouping. A stream grouping normally takes the source component ID as a parameter, and optionally other parameters as well, depending on the kind of stream grouping.

Tip

There can be more than one source per InputDeclarer, and each source can be grouped with a different stream grouping.

Shuffle Grouping

Shuffle Grouping is the most commonly used grouping. It takes a single parameter (the source component) and sends each tuple emitted by the source to a randomly chosen bolt warranting that each consumer will receive the same number of tuples.

The shuffle grouping is useful for doing atomic operations such as a math operation. ...

Get Getting Started with Storm now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.