Chapter 4. Spouts
In this chapter, you’ll take a look at the most commonly used strategies for designing the entry point for a topology (a spout) and how to make spouts fault-tolerant.
Reliable versus Unreliable Messages
When designing a topology, one important thing to keep in mind is message reliability. If a message can’t be processed, you need to decide what to do with the individual message and what to do with the topology as a whole. For example, when processing bank deposits, it is important not to lose a single transaction message. But if you’re processing millions of tweets looking for some statistical metric, and one tweet gets lost, you can assume that the metric will still be fairly accurate.
In Storm, it is the author’s responsibility to guarantee message reliability according to the needs of each topology. This involves a trade-off. A reliable topology must manage lost messages, which requires more resources. A less reliable topology may lose some messages, but is less resource-intensive. Whatever the chosen reliability strategy, Storm provides the tools to implement it.
To manage reliability at the spout, you can include a message ID
with the tuple at emit time (collector.emit(new Values(…),tupleId)). The
methods ack and fail are called when a tuple is processed
correctly or fails respectively. Tuple processing succeeds when the tuple
is processed by all target bolts and all anchored bolts (you will learn
how to anchor a bolt to a tuple in the Chapter 5).
Tuple processing ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access