O'Reilly logo

Learning Storm by Anand Nalya, Ankit Jain

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Producing a training dataset into Kafka

The first step while developing a machine-learning pipeline is to get the data in a place from where we can feed it to the training algorithm. In this case study, we will be using Kafka as the source of the training data.

For this, we will be writing a Kafka producer that will stream 80 percent of the data in the data file to the Kafka broker. The remaining 20 percent of the data will be stored in a file, which we will use to test our clustering model created by our topology.

We will be creating a Maven project for publishing data into Kafka. The following are the steps for creating the producer:

  1. Create a new Maven project with the com.learningstorm group ID and the ml-kafka-producer artifact ID.
  2. Add the following ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required