Learning Apache Apex
by Ananth Gundabattula, Thomas Weise, Munagala V. Ramanath, David Yan, Kenneth Knowles
State Management
Transformations such as dedup, join and windowed accumulation require state. In a pipeline that processes massive amounts of data, the state required for these transformations can grow very large. It may not fit into the operator's JVM heap memory and, even if it does, the operator wouldn't be fault tolerant, unless the state can be restored from durable storage. In addition, in a streaming use case the latency is important, which imposes additional requirements on the state management component. We need a solution that is fast, scalable, and fault tolerant.
For this purpose, the Apex library provides a utility called Managed State. It can persist large amounts of data on the distributed file system while allowing for asynchronous ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access