Apache Flume

Apache Flume is an open source system that was primarily developed to solve the following use case:

How to efficiently and reliably collect large amounts of Log-related data from different systems, normalize them, and store them in a reliable store.

At first glance, the use case seems simple enough to question the need of developing an entire system around it. But when developing a distributed, reliable, and fault-tolerant system that spans multiple machines running in different regions, a simple use case of aggregating logs from different machines and different application instances suddenly seem humongous.

You must keep a lot of things in mind. For example:

  • All the systems that deploy your distributed application should have ...

Get Architecting Data-Intensive Applications now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.