Kafka brokers are surprisingly resilient during version upgrades, but a rolling upgrade means you can swap out old versions for new ones, one broker at a time, without any interruption to your producers or consumers.

Let’s watch this happen with a tiny, simulated Kafka cluster. Imagine we have two brokers, broker-1 and broker-2, both running an older version (say, 2.8.0). We want to upgrade them to 3.6.1.

First, we stop broker-2.

# On broker-2's machine:
/opt/kafka/bin/kafka-server-stop.sh

Now, we upgrade broker-2’s binaries and configuration. A key change in newer Kafka versions is the default inter.broker.protocol.version. For a rolling upgrade, this needs to be set to the current version of the brokers still running (2.8.0 in this case), and log.message.format.version should be set to the target version (3.6.1). This tells the upgraded broker to communicate using the old protocol but prepare its logs for the new one.

Here’s a snippet from server.properties on broker-2 after the upgrade but before starting it:

# ... other configurations ...
broker.id=1 # Assuming broker-2 has ID 1
listeners=PLAINTEXT://:9092
advertised.listeners=PLAINTEXT://broker-2.example.com:9092
zookeeper.connect=zookeeper.example.com:2181
# Crucial for rolling upgrade:
inter.broker.protocol.version=2.8.0
log.message.format.version=3.6.1
# ... other configurations ...

We start broker-2 with this new configuration.

# On broker-2's machine:
/opt/kafka/bin/kafka-server-start.sh /opt/kafka/config/server.properties

Kafka’s controller (which might be running on another broker or even broker-1 in this small setup) will detect that broker-2 has rejoined the cluster with a newer version. It will initiate a metadata update. Crucially, because inter.broker.protocol.version is set to 2.8.0, broker-2 can still participate in replication and serve requests using the older protocol. It will begin converting its logs on disk to the new message format version (3.6.1) in the background.

Once broker-2 is fully upgraded and its logs are converted, we repeat the process for broker-1. Stop it, update binaries and config (again, inter.broker.protocol.version=2.8.0, log.message.format.version=3.6.1), and restart.

After both brokers are running the new version (3.6.1), we can then change inter.broker.protocol.version and log.message.format.version on all brokers to 3.6.1 and restart them one last time. This final restart ensures all brokers are speaking the latest protocol and writing logs in the newest format.

This staggered approach ensures that at no point are producers or consumers unable to find a leader or replica for their data, as older and newer brokers can interoperate during the upgrade.

The system works by leveraging Kafka’s tiered versioning system. inter.broker.protocol.version dictates how brokers communicate with each other, while log.message.format.version controls the on-disk format of log segments. During a rolling upgrade, you keep the inter.broker.protocol.version at the older version while setting log.message.format.version to the newer version. This allows the upgraded broker to talk to the old cluster but prepare its data for the new version. Once all brokers are running the new software, you then bump inter.broker.protocol.version to the new version, forcing a final reconciliation and enabling all new features.

Most people focus on the inter.broker.protocol.version during upgrades, but the real magic for zero-downtime migration comes from the ability to temporarily run with a newer log.message.format.version while maintaining compatibility with the older inter.broker.protocol.version. This allows data conversion to happen in the background without impacting cluster operations.

The next hurdle is understanding how to manage topic configurations that might have version-specific features enabled.

Want structured learning?

Take the full Kafka course →