Chapter 4. Designing Effective Data Pipelines
In this chapter, you will learn how to build resilient and effective data pipelines using Kafka Connect. We explain the key concepts and decision points that data engineers and architects have to understand when assembling the components we introduced in Chapter 3.
In the first half of this chapter, we look at how to choose connector plug-ins for your pipelines. You need a connector, a converter, and, optionally, some transformations and predicates. We discuss how to evaluate connectors and identify the one that satisfies your production requirements among the hundreds that are available in the community. Then we discuss how to model your data as it flows through the pipeline and the formatting options that you have available.
The second half of this chapter is focused on the resiliency characteristics of Kafka Connect. Before building your pipeline, you need to identify the semantics you require based on your use cases. For example, do you need to guarantee that every piece of data is delivered, or is it acceptable to lose some data in favor of increased throughput? We first dive into the inner workings of Kafka Connect, explaining why it is a robust environment that is able to handle failures. Then we look at the semantics that sink and source pipelines can achieve and the different configuration options and trade-offs available to target your specific use cases.
Choosing a Connector
When building a data pipeline that uses Kafka ...