Kafka producers don’t actually guarantee that your message will be written to disk on the broker.

The acks setting in your Kafka producer configuration is the primary lever you have for balancing message durability and producer throughput. It dictates how many acknowledgments the producer must receive from the Kafka brokers before considering a message successfully sent.

Let’s see it in action. Imagine we have a simple producer script in Python:

from kafka import KafkaProducer
import json
import time

producer = KafkaProducer(
    bootstrap_servers='localhost:9092',
    value_serializer=lambda x: json.dumps(x).encode('utf-8'),
    acks='1'  # Default, but we'll experiment with this
)

topic_name = 'my_test_topic'
message_data = {'message': 'Hello Kafka!'}

start_time = time.time()
for i in range(1000):
    producer.send(topic_name, message_data)
    if i % 100 == 0:
        producer.flush() # Ensure messages are sent periodically

end_time = time.time()
print(f"Sent 1000 messages with acks='1' in {end_time - start_time:.4f} seconds.")

producer.flush() # Ensure all remaining messages are sent
producer.close()

Now, let’s modify the acks setting and observe the performance difference.

The acks Setting Explained

The acks configuration parameter on the Kafka producer controls the durability guarantees. It accepts three possible values:

  • acks=0: The producer will not wait for any acknowledgment from the broker. As soon as the message is sent to the broker, the producer considers it successful. This offers the highest throughput but the lowest durability. If the broker crashes before writing the message to disk, the message is lost.

    • Diagnosis/Fix: If you set acks=0 and experience message loss, it’s because the producer didn’t wait for confirmation. There’s no "fix" in terms of configuration, as this is the intended behavior. The "fix" is to change your requirement to acks=1 or acks=all.
    • Why it works: The producer fires and forgets. No network round trip for acknowledgment means faster sending.
  • acks=1: The producer will wait for an acknowledgment from the leader of the partition. The leader acknowledges the message as soon as it has received it and written it to its local log. This is the default setting. It offers a good balance between durability and performance. If the leader crashes after acknowledging but before replicating to followers, the message might still be lost.

    • Diagnosis/Fix: If you set acks=1 and experience message loss, it’s likely due to a leader failure before replication completes. To mitigate this, ensure your min.insync.replicas setting on the broker is greater than 1 (e.g., min.insync.replicas=2 for a replication factor of 3).
    • Why it works: The producer gets confirmation from the primary copy of the data. This is usually very fast, but doesn’t guarantee against data loss if the leader fails immediately after acknowledging.
  • acks=all (or -1): The producer will wait for an acknowledgment from all in-sync replicas (ISRs) of the partition. This is the strongest guarantee of durability. The producer will only consider the message successfully sent once it has been replicated to the minimum number of brokers specified by min.insync.replicas on the broker configuration.

    • Diagnosis/Fix: If you set acks=all and experience slow performance, it’s expected. If you experience message loss even with acks=all, check your min.insync.replicas setting on the broker. If min.insync.replicas is set to a value higher than the actual number of available replicas, the producer will never receive an acknowledgment, leading to timeouts and potential message loss if the producer has a timeout configured. Ensure min.insync.replicas is less than or equal to the topic’s replication factor.
    • Why it works: The message must be written to disk on multiple brokers before the producer considers it "sent." This provides strong durability against broker failures.

The Mental Model: Durability vs. Latency

At its core, acks is a trade-off.

  • acks=0: Fire and forget. Max speed, min safety.
  • acks=1: Leader acknowledged. Good speed, decent safety.
  • acks=all: All ISRs acknowledged. Max safety, min speed.

The performance difference is directly tied to the network round trips and disk I/O required. acks=0 involves zero round trips for acknowledgment. acks=1 involves one round trip to the leader. acks=all involves multiple round trips, waiting for replication to a number of brokers defined by min.insync.replicas.

The min.insync.replicas setting on the Kafka broker is crucial. It defines how many replicas must acknowledge a write for it to be considered successful by the broker. If min.insync.replicas is set to N, and a producer uses acks=all, the producer will wait until N replicas have confirmed the write. If N is greater than the topic’s replication factor, writes will never succeed, and producers using acks=all will time out. A common setup for high durability is a replication factor of 3 and min.insync.replicas=2. This means a message is considered committed when it’s on at least two brokers, and the producer using acks=all will wait for these two acknowledgments.

When you set acks=all, the producer doesn’t just wait for any replicas; it waits for the number of replicas specified by the broker’s min.insync.replicas configuration for that topic. If the broker is configured with min.insync.replicas=2 for a topic, a producer with acks=all will wait for acknowledgments from at least two in-sync replicas. This synchronization overhead is what impacts performance.

The next thing you’ll likely wrestle with is how to configure retries and idempotence to further enhance durability without sacrificing too much performance.

Want structured learning?

Take the full Kafka course →