The number of partitions in a Kafka topic is the single most critical factor determining its maximum throughput.
Let’s see this in action. Imagine a simple Kafka producer writing to a topic named orders that has just one partition.
# Producer writing to 'orders' topic (1 partition)
kafka-console-producer \
--bootstrap-server kafka-broker-1:9092 \
--topic orders
# Simulate writing some data
> {"order_id": "123", "item": "widget", "quantity": 5}
> {"order_id": "124", "item": "gadget", "quantity": 2}
Now, let’s look at the consumer side. If we have one consumer instance, it can happily read from that single partition.
# Consumer reading from 'orders' topic (1 partition)
kafka-console-consumer \
--bootstrap-server kafka-broker-1:9092 \
--topic orders \
--from-beginning
# Output will show the messages
{"order_id": "123", "item": "widget", "quantity": 5}
{"order_id": "124", "item": "gadget", "quantity": 2}
The problem arises when we want to scale. If we add a second consumer instance to read from the same single-partition topic, Kafka’s consumer group rebalancing will assign all partitions (in this case, just the one) to one of the consumers. The other consumer will be idle. This is a hard limit: one consumer instance per partition at any given time within a consumer group.
So, how do we scale throughput? By increasing the number of partitions. Let’s recreate our orders topic with, say, 6 partitions.
# Create topic 'orders' with 6 partitions
kafka-topics.sh \
--bootstrap-server kafka-broker-1:9092 \
--create \
--topic orders \
--partitions 6 \
--replication-factor 3 # Assuming 3 brokers for replication
Now, if we start 6 consumer instances, each consumer can be assigned one unique partition.
# Start 6 consumer instances
for i in {1..6}; do
kafka-console-consumer \
--bootstrap-server kafka-broker-1:9092 \
--topic orders \
--group my-order-processor \
--consumer-property group.id=my-order-processor \
--partition-assignment-strategy range \
--consumer-property auto.offset.reset=earliest &
done
With 6 partitions and 6 consumers, each consumer thread is reading from a distinct partition, and we’ve effectively multiplied our potential consumption throughput by 6. The producer can also distribute messages across these 6 partitions, allowing for higher ingest rates.
The mental model is simple: each partition is an independent, ordered sequence of messages. Kafka brokers can serve multiple partitions in parallel. Consumers within a group can process partitions in parallel, with the constraint that a partition can only be consumed by one consumer instance at a time. Therefore, the number of partitions dictates the maximum parallelism for both producers and consumers.
The key levers you control are:
--partitionsduring topic creation: This is the primary dial. More partitions mean more potential parallelism.- Number of consumer instances: You can scale consumers up to the number of partitions in a topic for maximum parallelism.
- Keyed messages: If you send messages with a key (e.g.,
order_id), Kafka guarantees that all messages with the same key will land in the same partition. This ensures ordering for specific entities but can lead to hot partitions if keys are unevenly distributed.
The relationship between partitions and throughput isn’t linear beyond a certain point. While more partitions generally mean more throughput, each partition adds overhead: more open file handles on brokers, more network connections, and increased coordination during rebalances. Also, if your consumer logic is slow, adding more partitions won’t help; the bottleneck will be the processing speed of a single consumer instance. A common pitfall is to create far too many partitions, leading to resource exhaustion and slow rebalances, especially on older Kafka versions. The optimal number often requires experimentation, balancing throughput needs against broker resources and rebalance times.
The next concept to grapple with is how Kafka ensures message ordering, especially when dealing with keyed messages and multiple partitions.