Working with Apache Kafka

Apache Kafka provides a data streaming pipeline across the cluster through its message service. It ensures a high degree of fault tolerance and message reliability through its architecture, and it also guarantees to maintain message ordering from a producer. A record in Kafka is a (key-value) pair along with a timestamp and it usually contains a topic name. A topic is a category of records on which the communication takes place.

Kafka supports producer-consumer-based messaging, which means producers can produce messages that can be sent to consumers. It maintains a queue of messages, where there is also an offset that represents its position or index. Kafka can be deployed on a multi-node cluster, as shown in the ...

Get Apache Hadoop 3 Quick Start Guide now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.