Chapter 1. Overview and Architecture

If you are reading this book, chances are you are swimming in oceans of data. Creating mountains of data has become very easy, thanks to Facebook, Twitter, Amazon, digital cameras and camera phones, YouTube, Google, and just about anything else you can think of being connected to the Internet. As a provider of a website, 10 years ago, your application logs were only used to help you troubleshoot your website. Today, this same data can provide a valuable insight into your business and customers if you know how to pan gold out of your river of data.

Furthermore, as you are reading this book, you are also aware that Hadoop was created to solve (partially) the problem of sifting through mountains of data. Of course, ...

Get Apache Flume: Distributed Log Collection for Hadoop - Second Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.