The Kafka controller is the brain of the cluster, responsible for managing partitions, leaders, and replicas. If the controller goes down, the cluster enters a "no-leader" state, and no new messages can be produced or consumed.

How Leaders Are Chosen

Kafka uses Apache ZooKeeper to coordinate controller election. When a Kafka broker starts up, it attempts to become the controller by registering an ephemeral node in ZooKeeper under /controller. This is a classic distributed systems pattern for leader election: the first one to acquire the lock (create the ephemeral node) becomes the leader.

Here’s a simplified look at the process:

  1. Broker Starts: A Kafka broker starts and connects to ZooKeeper.
  2. Attempt to Become Controller: The broker tries to create an ephemeral ZNode at /controller.
  3. Ephemeral Node: If the node doesn’t exist, the broker successfully creates it and becomes the controller. It also stores its broker ID and host:port in this ZNode.
  4. Controller Failure: If the current controller broker fails or restarts, its ephemeral ZNode in ZooKeeper is automatically deleted.
  5. New Election: Other brokers that were waiting for a controller will now attempt to create the /controller ZNode. The first one to succeed becomes the new controller.
  6. Leader Election for Partitions: Once a controller is elected, it takes over the responsibility of ensuring each partition has a leader replica. It does this by examining the replica assignments for each topic and assigning a leader based on replica availability and the current state of the cluster.

This ZooKeeper-based election is robust but introduces a dependency on ZooKeeper. If ZooKeeper is unavailable, Kafka cannot elect a controller, and the cluster will not function.

The Controller’s Responsibilities

The controller is a busy component. Its core duties include:

  • Leader Election: For each partition, deciding which replica will be the leader. This is crucial for ensuring only one broker is actively handling reads and writes for a given partition.
  • Replica Management: Monitoring the health of all replicas for each partition. If a leader fails, the controller initiates a leader re-election among the in-sync replicas.
  • Topic Operations: Handling requests for creating, deleting, or altering topics. The controller is the single source of truth for metadata about topics and partitions.
  • Broker Registration: Keeping track of which brokers are alive and registered with the cluster.

Seeing it in Action: Controller ZooKeeper Node

You can observe the controller election process by looking at the /controller ZNode in ZooKeeper.

First, ensure your Kafka cluster is running and connected to ZooKeeper. You’ll need the zkCli.sh utility, typically found in your Kafka installation’s bin directory.

Let’s say your ZooKeeper is running on localhost:2181. You can connect and see the controller information:

./bin/zkCli.sh -server localhost:2181

Once connected, you can list the contents of the /controller path:

ls /controller

If a controller is active, you’ll see output like this:

[zk: localhost:2181(CONNECTED) 0] ls /controller
[controller]

Now, to see the actual data stored in the controller node, you can use the get command:

get /controller

The output will look something like this:

{"version":4,"brokerid":1,"timestamp":"1678886400000"}
cversion:0
stat:zxid=0x100000003,mzxid=0x100000003,ctime=2023-03-15T10:00:00.000Z,mtime=2023-03-15T10:00:00.000Z,version=0,aversion=0,ephemeralOwner=0x1000000000001,dataLength=34,numChildren=0

Here:

  • brokerid: This is the ID of the broker currently acting as the controller. In this example, it’s broker 1.
  • timestamp: The time when this controller information was last updated.

If you were to stop the broker with brokerid: 1, and then run ls /controller and get /controller again, you would see the controller ZNode disappear, and shortly after, another broker would attempt to create it and become the new controller.

The Counterintuitive Truth About Controller Failover

Most people assume that when a Kafka controller fails, the cluster simply waits for a new one to be elected. The counterintuitive part is how quickly this process can be stalled by ZooKeeper itself. While ZooKeeper is designed for high availability, if the ZooKeeper ensemble experiences network partitions or leader elections within ZooKeeper, it can severely delay or even halt Kafka’s controller election. A healthy Kafka cluster relies on an even healthier ZooKeeper ensemble, and issues in ZooKeeper often manifest as "Kafka is down" problems, even if the Kafka brokers themselves are perfectly fine.

The next logical step after understanding controller election is to dive into how partitions are assigned leaders once a controller is active, and the nuances of the In-Sync Replica (ISR) set.

Want structured learning?

Take the full Kafka course →