Message partitions

Suppose that we have in our possession a purchase table and we want to read records for an item from the purchase table that belongs to a certain category, say, electronics. In the normal course of events, we will simply filter out other records, but what if we partition our table in such a way that we will be able to read the records of our choice quickly?

This is exactly what happens when topics are broken into partitions known as units of parallelism in Kafka. This means that the greater the number of partitions, the more throughput. This does not mean that we should choose a huge number of partitions. We will talk about the pros and cons of increasing the number of partitions further.

While creating topics, you can ...

Get Building Data Streaming Applications with Apache Kafka now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.