Chapter 5. Bolts

As you have seen, bolts are key components in a Storm cluster. In this chapter, you’ll look at a bolt’s life cycle, some strategies for bolt design, and some examples of how to implement them.

Bolt Lifecycle

A bolt is a component that takes tuples as input and produces tuples as output. When writing a bolt, you will usually implement the IRichBolt interface. Bolts are created on the client machine, serialized into the topology, and submitted to the master machine of the cluster. The cluster launches workers that deserialize the bolt, call prepare on it, and then start processing tuples.

Tip

To customize a bolt, you should set parameters in its constructor and save them as instance variables so they will be serialized when submitting the bolt to the cluster.

Bolt Structure

Bolts have the following methods:

declareOutputFields(OutputFieldsDeclarer declarer)

Declare the output schema for this bolt

prepare(java.util.Map stormConf, TopologyContext context, OutputCollector collector)

Called just before the bolt starts processing tuples

execute(Tuple input)

Process a single tuple of input

cleanup()

Called when a bolt is going to shut down

Take a look at an example of a bolt that will split sentences into words:

class SplitSentence implements IRichBolt {
    private OutputCollector collector;

    public void prepare(Map conf, TopologyContext context, OutputCollector collector) {
        this.collector = collector;
    }

    public void execute(Tuple tuple) {
        String sentence = tuple.getString(0);
        for(String 

Get Getting Started with Storm now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.