Chapter 9. Data Integration with ksqlDB
The first step of building a stream processing application with ksqlDB is to consider where the data you want to process currently lives, and where the enriched/transformed data will eventually be written. Since ksqlDB leverages Kafka Streams under the hood, the direct input and outputs of the application you build will always be Kafka topics. ksqlDB makes it simple to integrate other data sources as well, including from popular third-party systems like Elasticsearch, PostgreSQL, MySQL, Google PubSub, Amazon Kinesis, MongoDB, and hundreds of others.
Of course, if your data already lives in Kafka and you don’t plan on writing the processed results to an external system, then working with ksqlDB’s data integration features (which are driven by Kafka Connect) isn’t required. However, should you ever need to read from/write to external systems, this chapter will provide the necessary foundations to help you connect the appropriate data sources and sinks using ksqlDB and Kafka Connect.
This chapter isn’t a comprehensive guide on Kafka Connect, which is a separate API in the Kafka ecosystem and, accordingly, a topic about which much can and has been written. We will provide enough background to get you started, and look at ksqlDB’s high-level abstractions for working with the Connect API. Some of the topics we will explore in this chapter include:
A quick Kafka Connect overview
Kafka Connect integration modes
Configuring Kafka Connect workers ...
Get Mastering Kafka Streams and ksqlDB now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.