Book description
How can you get your data from frontend servers to Hadoop in near real time? With this complete reference guide, you’ll learn Flume’s rich set of features for collecting, aggregating, and writing large amounts of streaming data to the Hadoop Distributed File System (HDFS), Apache HBase, SolrCloud, Elastic Search, and other systems.
Publisher resources
Table of contents
- Foreword
- Preface
- 1. Apache Hadoop and Apache HBase: An Introduction
-
2. Streaming Data Using Apache Flume
- The Need for Flume
- Is Flume a Good Fit?
- Inside a Flume Agent
- Configuring Flume Agents
- Getting Flume Agents to Talk to Each Other
- Complex Flows
- Replicating Data to Various Destinations
- Dynamic Routing
- Flume’s No Data Loss Guarantee, Channels, and Transactions
- Agent Failure and Data Loss
- The Importance of Batching
- What About Duplicates?
- Running a Flume Agent
- Summary
- References
- 3. Sources
- 4. Channels
- 5. Sinks
- 6. Interceptors, Channel Selectors, Sink Groups, and Sink Processors
- 7. Getting Data into Flume*
- 8. Planning, Deploying, and Monitoring Flume
- Index
Product information
- Title: Using Flume
- Author(s):
- Release date: September 2014
- Publisher(s): O'Reilly Media, Inc.
- ISBN: 9781449368302
You might also like
book
40 Algorithms Every Programmer Should Know
Learn algorithms for solving classic computer science problems with this concise guide covering everything from fundamental …
book
Designing Data-Intensive Applications
Data is at the center of many challenges in system design today. Difficult issues need to …
book
Architecting Modern Data Platforms
There’s a lot of information about big data technologies, but splicing these technologies into an end-to-end …
book
Data Science from Scratch, 2nd Edition
To really learn data science, you should not only master the tools—data science libraries, frameworks, modules, …