Chapter 4. Channels
Channels are buffers that sit in between sources and sinks. As such, channels allow sources and sinks to operate at different rates. Channels are key to Flume’s guarantees of not losing data (of course, when configured properly). Sources write data to one or more channels, which are read by one or more sinks. A sink can read only from one channel, while multiple sinks can read from the same channel for better performance. Channels have transactional semantics that allow Flume to provide explicit guarantees about the data written in a channel.
Having a channel operating as a buffer between sources and sinks has several advantages. The channel allows sources operating on the same channel to have their own threading models without being worried about the sinks reading from the channel, and vice versa. Having a buffer in between the sources and the sinks also allows them to operate at different rates, since the writes happen at the tail of the buffer and reads happen off the head. This also allows the Flume agents to handle “peak hour” loads from the sources, even if the sinks are unable to drain the channels immediately.
Channels allow multiple sources and sinks to operate on them. Channels are transactional in nature. Each write to a channel and each read from a channel happens within the context of a transaction. Only once a write transaction is committed will the events from that transaction be readable by any sinks. Also, if a sink has successfully taken