The first step in setting up the environment for our big data use case is to establish a Kafka node. Kafka is essentially a first in, first-out (FIFO) queue, so we will use the simplest single node (broker) setup. Kafka organizes data using topics, producers, consumers, and brokers.
Important Kafka terminology:
- A broker is essentially a node
- A producer is a process writing data to the message queue
- A consumer is a process reading data from the message queue
- A topic is the specific queue that we write to and read data from
A Kafka topic is further subdivided into a number of partitions. We can split data from a particular topic into multiple brokers (nodes) both when we write to the topic and also when we read our data at the ...