5

Streaming Data with Kafka

There are several streaming platforms on the market, but Apache Kafka is the front-runner. Kafka is an open source project like Spark but focuses on being a distributed message system. Kafka is used for several applications, including microservices and data engineering. Confluent is the largest contributor to Apache Kafka and offers several offerings in the ecosystem, such as Hosted Kafka, Schema Registry, Kafka Connect, and the Kafka REST API, among others. We will go through several areas of Confluent for Kafka, focusing on data processing and movement.

In this chapter, we will cover the following main topics:

  • Kafka architecture
  • Setting Confluent Kafka
  • Kafka streams
  • Schema Registry
  • Spark and Kafka
  • Kafka Connect ...

Get Modern Data Architectures with Python now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.