Chapter 1. Fast Data Front Ends for Hadoop

Building streaming data applications that can manage the massive quantities of data generated from mobile devices, M2M, sensors, and other IoT devices is a big challenge many organizations face today.

Traditional tools, such as conventional database systems, do not have the capacity to ingest fast data, analyze it in real time, and make decisions. New technologies, such as Apache Spark and Apache Storm, are gaining interest as possible solutions to handling fast data streams. However, only solutions such as VoltDB provide streaming analytics with full Atomicity, Consistency, Isolation, and Durability (ACID) support.

Employing a solution such as VoltDB, which handles streaming data, provides state, ensures durability, and supports transactions and real-time decisions, is key to benefitting from fast (and big) data.

Data ingestion is a pressing problem for any large-scale system. Several architecture options are available for cleaning and pre-processing data for efficient and fast storage. In this report, we will discuss the advantages and disadvantages of various fast data front ends for Hadoop.

Figure 1-1. Typical big data architecture

Figure 1-1 presents a high-level view of a typical big data architecture. A key component is the HDFS file store. On the left-hand side of HDFS, various data sources and systems, such as Flume and Kafka, ...

Get Fast Data Front Ends for Hadoop now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.