O'Reilly logo

Hadoop 2.x Administration Cookbook by Gurmukh Singh

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Configuring Flume

In this recipe, we will cover how to configure Flume for data ingestion. Flume is a general tool that consumes a log stream or Twitter feeds.

In any organization, we might have hundreds of web servers serving web pages, and we may need to quickly parse these logs for ads targeting or triggering events. These Apache web server logs can be streamed to Flume, from where they can be constantly uploaded to HDFS for processing.

In simple terms, Flume is a distributed, reliable, and efficient way of collecting and aggregating data into HDFS. It has the concepts of Flume agents, channels, and sinks, which together make a robust system. There can be multiple sources, channels, and output paths like a file system on a non-HDFS or HDFS filesystem, ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required