The Kafka Idempotent Producer is a game-changer because it guarantees exactly-once processing semantics at the producer side, not just at the broker.

Let’s see it in action. Imagine you have a Kafka topic user_events and you’re sending user login events. Without idempotence, if your producer retries sending a message due to a transient network glitch, you might end up with duplicate user_login events in Kafka. This could lead to incorrect user session counts or double-charging a user if the event was an order placement.

Here’s a simplified producer snippet (in Java, but the concept is universal):

Properties props = new Properties();
props.put("bootstrap.servers", "localhost:9092");
props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");

// Enable idempotence
props.put("enable.idempotence", "true");
props.put("acks", "all"); // 'all' is required for idempotence
props.put("max.in.flight.requests.per.connection", "5"); // Recommended for idempotence

KafkaProducer<String, String> producer = new KafkaProducer<>(props);

try {
    for (int i = 0; i < 100; i++) {
        String key = "user-" + i;
        String value = "{\"event\": \"user_login\", \"timestamp\": " + System.currentTimeMillis() + "}";
        ProducerRecord<String, String> record = new ProducerRecord<>("user_events", key, value);

        // The send operation might return a RecordMetadata or throw an exception
        RecordMetadata metadata = producer.send(record).get(); // .get() for synchronous send, for async use callback
        System.out.println("Sent record (key: " + key + ") to partition " + metadata.partition() + " offset " + metadata.offset());
    }
} finally {
    producer.close();
}

When enable.idempotence is set to true, Kafka does a few things under the hood. For each producer instance, it generates a unique Producer ID (PID). Then, for every message sent to a specific partition, it assigns a sequence number. The broker keeps track of the last sequence number received for each PID and partition. If a message arrives with a sequence number that has already been seen for that PID and partition, the broker simply discards the duplicate.

The core problem idempotence solves is retries. Network issues, broker restarts, or temporary unavailability can cause a producer to lose track of whether a message was successfully committed. Without idempotence, a naive retry mechanism would resend the message, leading to duplicates.

To enable idempotence, you must set enable.idempotence to true. This also implicitly sets acks to all if it’s not explicitly set, but it’s good practice to set acks=all yourself for clarity and to ensure all replicas have acknowledged the write before the producer considers it successful. Additionally, max.in.flight.requests.per.connection should be set to 5 or less (default is 5). This prevents reordering of messages that are in flight. If a producer sends messages M1, M2, M3 and M2 fails, but M3 succeeds and is retried, having max.in.flight.requests.per.connection greater than 1 could allow M3 to be committed before M2 is retried and committed. With max.in.flight.requests.per.connection <= 5, the producer will buffer requests and only send them in order, ensuring that if a message is retried, it’s retried in the correct sequence relative to other in-flight messages.

The broker’s responsibility is to store and check these PIDs and sequence numbers. When a producer sends a message, it includes its PID and the message’s sequence number for that partition. The broker looks up the last sequence number it recorded for that PID and partition. If the incoming sequence number is greater than the recorded one, it accepts the message, updates the recorded sequence number, and commits it. If the incoming sequence number is equal to or less than the recorded one, it’s a duplicate, and the broker simply acknowledges it without replaying it.

A crucial detail is that idempotence is per producer instance. If your application restarts and a new producer instance is created, it will get a new PID. This means that if a message was sent by the old producer instance, retried, and then the application restarted before the broker acknowledged the retry, the new producer instance would not be able to distinguish that retried message from a new one. For true exactly-once semantics across application restarts, you need to combine idempotence with transactional producers.

The sequence number is not exposed directly to the user; it’s an internal Kafka mechanism. The producer handles incrementing it, and the broker handles storing and checking it. The PID is also managed internally. The producer will typically attempt to obtain a PID when it starts up and will regenerate it if it encounters certain errors that indicate the broker might have lost track of its previous PID.

The next thing you’ll likely encounter when dealing with message delivery guarantees is the need for transactional producers, which build upon idempotence to provide exactly-once semantics across multiple Kafka topics and partitions, even in the face of application failures.

Want structured learning?

Take the full Kafka course →