O'Reilly logo

Hadoop Blueprints by Tanmay Deshpande, Anurag Shrivastava

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Batch data analytics

Now let's start looking at the implementation of batch data analytics. Batch data analytics consists of two important elements:

  1. Loading streams of sensor data from Kafka topics to HDFS.
  2. Using Hive to perform analytics on inserted data.

Loading streams of sensor data from Kafka topics to HDFS

Let's assume that the sensors are enabled to write data to Kafka topics. Microcomputers such as the Raspberry Pi can be used to develop the interface between sensors and Kafka. In this section, we are going to see how we get the data from Kafka topics and write it to the HDFS folder.

To import the data from Kafka, first you need to have Kafka running on your machine. The following command starts Kafka and Zookeeper:

bin/zookeeper-server-start.sh ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required