Foreword
Consensus protocols, stream processing, distributed systems—in the midst of all the exciting ideas in the streaming world, it can be easy to overlook the role of the humble connector. But connectors solve the most fundamental problem in the streaming world—in a world of data at rest, how do you access streams at all? How do you plug your data-streaming platform into the rest of the business?
Kafka Connect’s aim is to make that easier. Before the Kafka Connect framework existed, we saw many people build integrations with Apache Kafka and repeat the same mistakes. Reading data from one system and writing it to another seems simple enough, but the process can have a lot of hidden complexity. What happens if a machine fails? What happens when requests time out? How do you scale up your integration? Each unique Kafka integration had to solve these problems from scratch. Kafka Connect was designed to separate out the logic of reading and writing to a particular system from a general framework for building, operating, and scaling these integrations.
Kafka Connect is different from other integration or connector layers in a lot of important ways:
It’s designed for streaming first.
It works with Kafka’s semantics to enable exactly-once from systems that will allow it, and the strongest semantics possible for systems that don’t.
It lets you not just capture bytes, but also propagate some of the semantic structure of data.
It solves a lot of the complex problems in partitioning, ...