Pub/Sub: Decouple Services, Scale Async

Kafka’s pub-sub model is actually a distributed commit log, and its real magic is that producers never know about consumers.

Imagine a busy trading floor. Orders (messages) are coming in fast from traders (producers). Instead of each trader having to know who needs which order, they just shout it out to a central ticker tape machine (Kafka). Different departments (consumers) then independently read from the ticker tape at their own pace, picking up only the orders relevant to them. This decoupling is key.

Here’s Kafka in action. Let’s say we have a Kafka cluster running on three brokers: kafka-broker-1:9092, kafka-broker-2:9092, and kafka-broker-3:9092. We want to set up a topic called stock_trades to track stock transactions.

First, we create the topic. We’ll want at least two partitions for parallelism and a replication factor of 3 so that if one broker goes down, we don’t lose data.

kafka-topics.sh --bootstrap-server kafka-broker-1:9092 --create --topic stock_trades --partitions 2 --replication-factor 3

Now, a producer can start sending messages to this topic without knowing anything about who will consume them. Let’s simulate a producer sending a trade for Apple stock.

kafka-console-producer.sh --bootstrap-server kafka-broker-1:9092 --topic stock_trades

At the prompt, we type:

{"symbol": "AAPL", "price": 175.50, "quantity": 100, "timestamp": 1678886400}

Simultaneously, a consumer can start reading messages from the stock_trades topic. This consumer might be a system that updates a real-time stock ticker. We need to assign it to a consumer group, say stock_ticker_group.

kafka-console-consumer.sh --bootstrap-server kafka-broker-1:9092 --topic stock_trades --group stock_ticker_group --from-beginning

When the consumer starts, it will read the message we just produced. If another consumer joins the same stock_ticker_group, Kafka’s partition assignment mechanism ensures that each partition is consumed by only one consumer within that group. If we had a second partition, and a second consumer joined stock_ticker_group, one consumer would read from partition 0, and the other from partition 1.

The core problem Kafka solves is distributed message queuing with high throughput and fault tolerance. Traditional message queues often require tight coupling between producers and consumers, or they struggle with massive scale. Kafka, by acting as a distributed commit log, allows producers to write data extremely fast and consumers to read it at their own pace, independently. This means a slow consumer doesn’t block producers, and producers can handle bursts of traffic without overwhelming downstream systems.

Internally, Kafka organizes data into topics, which are then divided into partitions. Each partition is an ordered, immutable sequence of records. These partitions are replicated across multiple brokers for fault tolerance. Producers write records to specific partitions (or Kafka can choose one for them), and consumers read from partitions. The consumer’s state – which messages it has processed – is tracked via offsets within each partition. This offset is committed to Kafka itself, managed by consumer groups.

The exact levers you control are primarily:

Topics: The logical channels for your data.
Partitions: The units of parallelism and parallelism. More partitions allow for higher throughput if you have enough consumers to process them in parallel.
Replication Factor: Dictates how many brokers hold a copy of each partition’s data, directly impacting fault tolerance. A replication factor of 3 is common, meaning 2 brokers can fail before data loss.
Producers: Control serialization, batching, acknowledgment levels (acks=0, acks=1, acks=all), and partitioning strategies.
Consumers: Belong to consumer groups. They control their read speed and how they commit offsets.

The most surprising thing about Kafka’s consumer offset management is that the consumer group’s offsets are themselves stored in a special Kafka topic (__consumer_offsets). This means Kafka’s own internal state is managed by the same distributed commit log mechanism, providing strong guarantees for consumer progress tracking.

The next concept you’ll want to explore is Kafka’s producer acks setting and how it impacts durability versus latency.