Kafka’s replication factor and min.insync.replicas settings are fundamental to ensuring data durability and availability, but their interaction can be surprisingly subtle and lead to unexpected behavior if not understood.
Let’s see this in action. Imagine you have a Kafka topic named my-topic with 3 partitions.
# Check topic configuration
kafka-topics.sh --bootstrap-server localhost:9092 --describe --topic my-topic
Output might look like this:
Topic:my-topic PartitionCount:3 ReplicationFactor:3 Configs:
Topic: my-topic Partition: 0 Leader: 1 Replicas: 1,2,3 Isr: 1,2,3
Topic: my-topic Partition: 1 Leader: 2 Replicas: 2,3,1 Isr: 2,3,1
Topic: my-topic Partition: 2 Leader: 3 Replicas: 3,1,2 Isr: 3,1,2
Here, ReplicationFactor is 3, meaning each partition has 3 copies. Isr (In-Sync Replicas) lists the replicas that are caught up and considered up-to-date. Ideally, Isr should match the Replicas list.
Now, let’s look at the broker-level configuration for min.insync.replicas. This is typically set in server.properties.
# Example server.properties
broker.id=1
listeners=PLAINTEXT://localhost:9091
# ... other configurations ...
min.insync.replicas=2
If min.insync.replicas is set to 2 at the broker level, it means that for any producer request that requires acknowledgments (like acks=all), at least 2 replicas must be in sync for the broker to consider the write successful.
The magic happens when you produce data. If a producer uses acks=all (the default for min.insync.replicas to have an effect), Kafka will wait until min.insync.replicas number of brokers have successfully written the data and acknowledged it.
Consider this producer configuration:
Properties props = new Properties();
props.put("bootstrap.servers", "localhost:9091,localhost:9092,localhost:9093");
props.put("acks", "all"); // Crucial for min.insync.replicas to matter
props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");
// If min.insync.replicas=2, and acks=all, the producer will only get an ACK
// once at least 2 brokers have confirmed the write.
This setup ensures that even if one broker (a replica) fails immediately after the write, the data is still safely stored on at least one other replica. If min.insync.replicas was 1, and acks=all, then only one broker would need to acknowledge, offering less durability. If min.insync.replicas was 3 (and replication.factor was 3), a single broker failure would halt all writes.
The most surprising true thing about min.insync.replicas is that it applies per partition. If min.insync.replicas is 2, and you have 10 partitions, each partition must have at least 2 in-sync replicas for producers using acks=all to succeed. This is why it’s often set at the broker level, but can also be set per topic.
Here’s how you’d set it per topic:
kafka-topics.sh --bootstrap-server localhost:9092 --alter --topic my-topic --config min.insync.replicas=2
If you have replication.factor = 3 and min.insync.replicas = 2, a producer with acks=all will tolerate the failure of one broker. If two brokers fail, the producer will see an error because the min.insync.replicas requirement cannot be met for any partition.
The core problem min.insync.replicas solves is the trade-off between write latency and durability. Setting acks=all without a min.insync.replicas greater than 1 means a write is acknowledged by the leader and then the leader hopes the followers will catch up. With min.insync.replicas > 1, the leader waits for confirmation from a quorum of replicas, guaranteeing durability even if the leader itself fails immediately after acknowledging.
The one thing most people don’t realize is that min.insync.replicas is a minimum requirement for acknowledgment, not a guarantee that all replicas will always be in sync. It defines the threshold for producer success. If a broker is down and cannot be brought back into sync quickly, and the number of available in-sync replicas for a partition drops below min.insync.replicas, then producers using acks=all will fail to write to that partition until enough replicas are back online.
The next thing you’ll likely encounter is understanding how acks setting on the producer interacts with min.insync.replicas and the replication.factor.