Chapter 6. Recipe: Combine At-Least-Once Delivery with Idempotent Processing to Achieve Exactly-Once Semantics

Idea in Brief

When dealing with streams of data in the face of possible failure, processing each datum exactly once is extremely difficult. When the processing system fails, it may not be easy to determine which data was successfully processed and which data was not.

Traditional approaches to this problem are complex, require strongly consistent processing systems, and require smart clients that can determine through introspection what has or hasn’t been processed.

As strongly consistent systems have become more scarce, and throughput needs have skyrocketed, this approach often has been deemed unwieldy and impractical. Many have given up on precise answers and chosen to work toward answers that are as correct as possible under the circumstances. The Lambda Architecture proposes doing all calculations twice, in two different ways, to allow for cross-checking. Conflict-free replicated data types (CRDTs) have been proposed as a way to add data structures that can be reasoned about when using eventually consistent data stores.

If these options are less than ideal, idempotency offers another path.

An idempotent operation is an operation that has the same effect no matter how many times it is applied. The simplest example is setting a value. If I set x = 5, then I set x = 5 again, the second action doesn’t have any effect. How does this relate to exactly-once processing? For ...

Get Fast Data: Smart and at Scale now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.