Kafka Rack Awareness: Survive Data Center Zone Failures (2026)

Kafka’s rack awareness is the mechanism that ensures your data remains available even if an entire data center zone goes offline.

Let’s see it in action. Imagine a Kafka cluster with three brokers, each in a different "rack" (representing a data center zone). We’ve configured a topic to have a replication factor of 3.

# Simulate a broker configuration file (server.properties)
broker.id=0
listeners=PLAINTEXT://broker0.example.com:9092
log.dirs=/opt/kafka/logs
zookeeper.connect=zk0.example.com:2181
# This is the crucial part for rack awareness
broker.rack=us-east-1a

broker.id=1
listeners=PLAINTEXT://broker1.example.com:9092
log.dirs=/opt/kafka/logs
zookeeper.connect=zk0.example.com:2181
broker.rack=us-east-1b

broker.id=2
listeners=PLAINTEXT://broker2.example.com:9092
log.dirs=/opt/kafka/logs
zookeeper.connect=zk0.example.com:2181
broker.rack=us-east-1c

Now, let’s create a topic with a replication factor of 3 and ensure it’s distributed across these racks:

kafka-topics.sh --create --topic my-replicated-topic --bootstrap-server broker0.example.com:9092 --replication-factor 3 --partitions 1

When you inspect the topic’s partition assignments, you’ll see that Kafka has intelligently placed the leader and its replicas on brokers residing in different racks.

kafka-topics.sh --describe --topic my-replicated-topic --bootstrap-server broker0.example.com:9092

Topic: my-replicated-topic PartitionCount: 1 ReplicationFactor: 3 Isr: 0,1,2
  Leader: 0
  Replicas: 0,1,2
  Isr: 0,1,2

If broker0.example.com (in rack us-east-1a) goes down, Kafka will automatically elect a new leader from the remaining brokers (broker1 or broker2), both of which are in different racks. This ensures that producers and consumers can continue to operate without interruption, as the data is still available on healthy brokers in other zones.

The core problem rack awareness solves is data loss and unavailability during localized hardware or network failures. Without it, if a broker holding a replica fails, and that replica was the only copy of a partition in a particular zone, that partition becomes unavailable. More critically, if the broker holding the leader replica fails and there are no other replicas in different zones, the entire partition is lost until the leader comes back online. Kafka’s rack awareness ensures that replicas are spread across distinct failure domains (racks).

Here’s how it works internally: When a topic is created or altered, Kafka’s controller (typically running on one of the brokers) consults the broker.rack configuration for each broker. It then uses a placement strategy to distribute replicas for each partition such that no two replicas for the same partition reside within the same rack. The default strategy aims to place one replica in each rack if the replication factor is less than or equal to the number of racks, and then distribute the remaining replicas as evenly as possible across racks.

The key configuration parameter is broker.rack. This value is read by the broker upon startup and registered with ZooKeeper. Kafka clients and brokers use this information to make informed decisions about data placement and leadership election. When you create a topic with replication-factor N, Kafka’s controller attempts to assign replicas to N distinct racks. If N is greater than the number of available racks, Kafka will distribute replicas as best as it can, but it cannot guarantee a unique rack for every replica.

You can dynamically change the broker.rack configuration without restarting brokers, and Kafka will gradually rebalance partitions to adhere to the new rack assignments. This is particularly useful when reconfiguring your data center topology or performing maintenance.

The inter.broker.protocol.security.protocol and advertised.listeners configurations are indirectly related, as they define how brokers communicate. If these are not correctly configured to allow brokers in different racks to reach each other, rack awareness will fail, not because of the rack configuration itself, but because the brokers cannot communicate to maintain replication.

The most surprising aspect of Kafka’s rack awareness is that it doesn’t inherently prevent data loss if your replication factor is too low for your rack count. If you have 5 racks and a replication factor of 3, and two brokers in the same rack fail, you could lose data for a partition if its leader and one other replica were on those two failed brokers. The system distributes to survive, but it doesn’t magically create more copies than you’ve asked for.

The next concept you’ll likely encounter is understanding how Kafka handles controller failover and the implications for partition leadership.