Kafka was designed at LinkedIn to get around some of the limitations of traditional message brokers, and to avoid the need to set up different message brokers for different point-to-point setups, as described in “Scaling Up and Out”. LinkedIn’s use cases were predominantly based around ingesting very large volumes of data such as page clicks and access logs in a unidirectional way, while allowing multiple systems to consume that data without affecting the performance of producers or other consumers. In effect, Kafka’s reason for being is to enable the sort of messaging architecture that the Universal Data Pipeline describes.
Given this end goal, other requirements naturally emerged. Kafka had to:
Be extremely fast
Allow massive message throughput
Support publish-subscribe as well as point-to-point
Not slow down with the addition of consumers; both queue and topic performance degrades in ActiveMQ as the number of consumers rise on a destination
Be horizontally scalable; if a single broker that persists messages can only do so at the maximum rate of the disk, it makes sense that to exceed this you need to go beyond a single broker instance
Permit the retention and replay of messages
In order to achieve all of this, Kafka adopted an architecture that redefined the roles and responsibilities of messaging clients and brokers. The JMS model is very broker-centric, where the broker is responsible for the distribution of messages, and clients only have to ...