Skip to Main Content
Using Flume
book

Using Flume

by Hari Shreedharan
September 2014
Intermediate to advanced content levelIntermediate to advanced
238 pages
6h 17m
English
O'Reilly Media, Inc.
Content preview from Using Flume

Chapter 5. Sinks

Flume is designed with the ability to plug in practically every component, including the ones that write the data out to the eventual destination—in most cases, some data store.

The component that removes data from a Flume agent and writes it to another agent or a data store or some other system is called a sink. To facilitate this process, Flume allows the user to configure the sink, which could be one of the sinks that comes bundled with Flume or one that was written by the user (for custom sinks not built into Flume, the JARs should be dropped into Flume’s plugins.d directory).

Sinks are the components in a Flume agent that keep draining the channel, so that the sources can continue receiving events and writing to the channel. Sinks continuously poll the channel for events and remove them in batches. These batches of events are either written out to a storage or indexing system, or sent to another Flume agent.

Sinks are fully transactional. Each sink starts a transaction with the channel before removing events in batches from it. Once the batch of events is successfully written out to storage or to the next Flume agent, the sink commits the transaction with the channel. Once the transaction is committed, the channel removes the events from its own internal buffers.

Flume comes packaged with a number of sinks that can write to storage and indexing systems such as HDFS, HBase, Solr, Elastic Search, etc. These sinks are what are generally referred to as terminal ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Apache Flume: Distributed Log Collection for Hadoop - Second Edition - Second Edition

Apache Flume: Distributed Log Collection for Hadoop - Second Edition - Second Edition

Steven Hoffman
Java Data Objects

Java Data Objects

David Jordan, Craig Russell

Publisher Resources

ISBN: 9781491905326ErrataSupplemental Content