Learning Apache Apex
by Ananth Gundabattula, Thomas Weise, Munagala V. Ramanath, David Yan, Kenneth Knowles
Watermarks in Beam
Simply putting it: a watermark is an estimate of the oldest data you expect to see. As real time passes during your pipeline's execution, a Beam runner (for you, Apex) maintains this estimate for each PCollection in your pipeline. In streaming applications, a watermark will generally track a bit behind wall-clock time and advance irregularly as incoming queued data is processed.
In the following illustration, processing time—the time that passes as your computation proceeds—is on the vertical axis. Event time—the time as recorded in your data stream—is on the horizontal axis. The red curve is the progress of the watermark. You can trace this by considering processing time proceeding upwards, while the estimate of Beam runner ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access