Chapter 4. Kafka as Streaming Transport
In Chapter 2, we established that at the heart of the revolution in design for streaming architectures is the capability for message passing that meets particular fundamental requirements for these large-scale systems. We recommended two technologies that are a good fit for the needed capabilities: Apache Kafka and MapR Streams. In this chapter, we examine in some detail Kafka, a pioneer in this style of messaging.
Motivations for Kafka
Apache Kafka started life as an engineering project at LinkedIn that was intended to bring order to the way that data moved between services. Most of the services at LinkedIn were originally designed to make heavy use of a relational database and to use remote method invocation (RMI) between Java processes where communication was necessary.
Unfortunately, both of these choices made it very difficult to deal with the rapid expansion of both the number of services and the amount of data being moved. Whenever one service needed to communicate with another, an adapter had to be developed and maintained. Moreover, each adapter tended to make the modification of both sender and receiver more difficult since every pair of communicating services effectively exposed a bit of the implementation of each to the other. The result was that it was incredibly difficult to update systems. Just as important, it was very difficult to move as much information between services as was needed.
Systems like SOAP, CORBA, or Java’s ...