Streaming a pipeline with Kafka to Storm to HDFS

In this section, we will see how the data streams will flow from Kafka to Storm to HDFS and access them with a Hive external table.

The following image shows the components of the pipeline. In this pipeline, we will learn how the messages will flow from Kafka to Storm to HDFS in real-time:

The whole pipeline will work as follows:

  1. We will ingest customer records (customer_id, customer_firstname, and customer_lastname) in Kafka using the Kafka console-producer API
  2. After that, Storm will pull the messages from Kafka
  3. A Connection to HDFS will be established
  4. Storm will use HDFS-Bolt to ingest records ...

Get Modern Big Data Processing with Hadoop now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.