Chapter 6. Time-Based and Window Operators

In this chapter, we will cover DataStream API methods for time handling and time-based operators, like windows. As you learned in “Time Semantics”, Flink’s time-based operators can be applied with different notions of time.

First, we will learn how to define time characteristics, timestamps, and watermarks. Then, we will cover the process functions, low-level transformations that provide access to timestamps and watermarks and can register timers. Next, we will get to use Flink’s window API, which provides built-in implementations of the most common window types. You will also get an introduction to custom, user-defined window operations and core windowing constructs, such as assigners, triggers, and evictors. Finally, we will discuss how to join streams on time and strategies to handle late events.

Configuring Time Characteristics

To define time operations in a distributed stream processing application, it is important to understand the meaning of time. When you specify a window to collect events in one-minute buckets, which events exactly will each bucket contain? In the DataStream API, you can use the time characteristic to tell Flink how to define time when you are creating windows. The time characteristic is a property of the StreamExecutionEnvironment and it takes the following values:

ProcessingTime

specifies that operators determine the current time of the data stream according to the system clock of the machine where they ...

Get Stream Processing with Apache Flink now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.