Chapter 3: Implementing Pipelines Using Stateful Processing

In the previous chapter, we focused on implementing pipelines that used high-level transformations. Such transforms tend to have low numbers of parameters and/or methods that need to be implemented in order to use them, and this comes at the expense of somewhat limited usability. Let's demonstrate this using the example of the GroupByKey transform. This is quite simply defined as a transform that wraps elements with the same key into an Iterable object. This Iterable object (essentially, nothing more than a bag of elements) is then triggered based on a windowing strategy. Nothing more, nothing less. But what if we need finer control? What if we want to control exactly when we emit the ...

Get Building Big Data Pipelines with Apache Beam now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Building Big Data Pipelines with Apache Beam by Jan Lukavský

Chapter 3: Implementing Pipelines Using Stateful Processing

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly