O'Reilly logo

Hadoop Blueprints by Tanmay Deshpande, Anurag Shrivastava

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Apache Flume

Data Lakes can be filled with data coming from multiple sources at different speeds. The tools in the ingestion tier, such as Apache Flume, can handle the massive volume of incoming data and store it on HDFS.

Apache Flume is a distributed and scalable tool that can reliably collect the data from different sources and move it to a centralized data store on HDFS. Massive volumes of data can be generated in the form of weblogs or sensor data and stored on HDFS for analysis and distribution. Though the typical use cases of Apache Flume involve collection and storage of log data, it can be used to ingest any kind of data in HDFS.

Understanding the Design of Flume

Flume is an agent-based system. It contains three components:

  • Source: The source ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required