Skip to Main Content
Using Flume
book

Using Flume

by Hari Shreedharan
September 2014
Intermediate to advanced content levelIntermediate to advanced
238 pages
6h 17m
English
O'Reilly Media, Inc.
Content preview from Using Flume

Chapter 4. Channels

Channels are buffers that sit in between sources and sinks. As such, channels allow sources and sinks to operate at different rates. Channels are key to Flume’s guarantees of not losing data (of course, when configured properly). Sources write data to one or more channels, which are read by one or more sinks. A sink can read only from one channel, while multiple sinks can read from the same channel for better performance. Channels have transactional semantics that allow Flume to provide explicit guarantees about the data written in a channel.

Having a channel operating as a buffer between sources and sinks has several advantages. The channel allows sources operating on the same channel to have their own threading models without being worried about the sinks reading from the channel, and vice versa. Having a buffer in between the sources and the sinks also allows them to operate at different rates, since the writes happen at the tail of the buffer and reads happen off the head. This also allows the Flume agents to handle “peak hour” loads from the sources, even if the sinks are unable to drain the channels immediately.

Channels allow multiple sources and sinks to operate on them. Channels are transactional in nature. Each write to a channel and each read from a channel happens within the context of a transaction. Only once a write transaction is committed will the events from that transaction be readable by any sinks. Also, if a sink has successfully taken

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Apache Flume: Distributed Log Collection for Hadoop - Second Edition - Second Edition

Apache Flume: Distributed Log Collection for Hadoop - Second Edition - Second Edition

Steven Hoffman
Java Data Objects

Java Data Objects

David Jordan, Craig Russell

Publisher Resources

ISBN: 9781491905326ErrataSupplemental Content