The Kafka broker responsible for a partition is not available or is unable to serve leadership requests, causing producers and consumers to fail.
Common Causes and Fixes for NotLeaderForPartitionException
This error means that the Kafka broker a client is trying to communicate with for a specific topic partition doesn’t think it’s the leader for that partition. This can happen for a variety of reasons, but the core issue is that the partition’s leadership is in an unstable state.
-
Under-Replicated Partitions (Most Common): This is the classic symptom. A partition’s leader is available, but one or more of its replicas are down or lagging significantly. Kafka’s controller, which manages partition leadership, will often attempt to elect a new leader if the current one becomes unavailable. If the partition is under-replicated (meaning not all in-sync replicas are available), the controller might prevent a leader election or the new leader might not be able to catch up quickly.
-
Diagnosis: Check the replication status of your topics.
kafka-topics.sh --bootstrap-server kafka-broker-1:9092 --describe --topic your_topic_nameLook for partitions where the
Leaderis present butReplicashas fewer brokers listed than expected, or whereIsr(In-Sync Replicas) is less thanReplicas. For example, ifReplicas: 0,1,2andIsr: 0,1, partition 0 is under-replicated. -
Fix: Identify the missing replicas and bring them back online. If a broker is down, restart it. If a replica is lagging, it might need to be re-assigned or have its log cleared.
# Example: If broker 2 is down for topic_name partition 0 # Restart broker 2. Once it rejoins the cluster and catches up, # the ISR will update automatically.If a broker is persistently problematic, you might need to remove it from the cluster and re-add it, or potentially reassign the partition to healthier brokers.
-
Why it works: Kafka requires a quorum of in-sync replicas to be available to ensure data durability and consistency. If this quorum is not met, the controller might prevent leadership changes or the new leader might not be fully synchronized, leading to
NotLeaderForPartitionExceptionas clients are directed to a broker that cannot guarantee data integrity.
-
-
Broker Restart or Unavailability: If the current leader broker for a partition restarts or becomes unreachable, clients will receive this error until a new leader is elected. This can be a temporary hiccup during rolling restarts or an indicator of a more serious broker failure.
-
Diagnosis: Check broker logs for signs of restarts, crashes, or network issues. Use Kafka’s JMX metrics to monitor broker health and connectivity.
# On the affected broker machine, check system logs sudo journalctl -u kafka -f # Or check Kafka's own logs tail -f /path/to/kafka/logs/server.log -
Fix: Ensure all brokers are running and healthy. If a broker crashed, investigate the cause (e.g., out of memory, disk full, network partition) and resolve it. Once the broker is back and has rejoined the cluster, leadership should stabilize.
-
Why it works: When a leader broker goes down, the Kafka controller detects this and initiates a leader election among the available in-sync replicas for that partition. Once a new leader is elected and clients are aware of it, the error subsides.
-
-
Network Partitions: If network issues cause a broker to be temporarily isolated from the Kafka controller or other brokers, it might lose its leadership status or be unable to serve requests, even if it’s technically running.
-
Diagnosis: Use
pingortraceroutefrom other brokers to the affected broker and vice-versa. Check firewall rules and network device logs.ping kafka-broker-X -
Fix: Resolve the underlying network connectivity issues. This might involve reconfiguring firewalls, correcting routing problems, or addressing issues with network hardware.
-
Why it works: Kafka relies on reliable inter-broker communication. Network partitions disrupt this communication, leading the controller to perceive a broker as unavailable and potentially trigger leadership changes, or preventing clients from reaching the actual leader.
-
-
Controller Leader Election: If the Kafka controller itself is undergoing an election (e.g., the controller broker restarts), partitions may temporarily become leaderless, resulting in
NotLeaderForPartitionException.-
Diagnosis: Examine the
server.logon your Kafka brokers for messages indicating controller elections. Look for lines like "KafkaController is starting" or "Electing new controller".grep "Electing new controller" /path/to/kafka/logs/server.log -
Fix: Controller elections are usually brief. If they are frequent or prolonged, it indicates a problem with the controller broker or cluster stability. Ensure the designated controller brokers are healthy and have stable network connectivity.
-
Why it works: The controller is the brain of the Kafka cluster, managing partition leadership. If the controller is unavailable, no leadership changes can occur, and existing leaders might be perceived as unavailable, leading to this error.
-
-
Topic Reassignment or Decommissioning: If partitions are being reassigned (e.g., using
kafka-reassign-partitions.sh) or if brokers are being decommissioned, leadership can temporarily shift or become unavailable during the process.-
Diagnosis: Check the status of any ongoing partition reassignments or broker decommissioning operations. Review the output of the reassignment tool.
# If using kafka-reassign-partitions.sh with --execute # Check the status of the assignment, often logged to a file. # If you don't have a log, you might need to re-run with --verify # after a sufficient waiting period. -
Fix: Allow reassignment operations to complete successfully. Monitor the process and ensure all brokers involved are healthy and reachable throughout. If an operation fails, troubleshoot and retry.
-
Why it works: During partition reassignments, leadership is explicitly moved. If the process is interrupted or if a broker involved in the reassignment becomes unavailable, it can leave partitions in a state where no leader is recognized.
-
-
Incorrect
advertised.listenersorlistenersConfiguration: If a broker’s listener configuration is incorrect, clients might be trying to connect to the wrong address or port, or the broker might not be advertising its correct network identity, leading to communication failures.-
Diagnosis: Verify
listenersandadvertised.listenersinserver.propertieson all brokers. Ensure they are set to the correct, resolvable network interfaces and ports.# Example correct configuration for a broker with IP 192.168.1.100 listeners=PLAINTEXT://0.0.0.0:9092 advertised.listeners=PLAINTEXT://192.168.1.100:9092 -
Fix: Correct the
listenersandadvertised.listenersconfiguration to reflect the actual network interfaces and addresses the brokers should use for inter-broker and client communication. Restart brokers after making changes. -
Why it works: Clients (producers, consumers, other brokers) discover partition leaders via metadata. If
advertised.listenersis wrong, clients will be directed to an inaccessible address, or the broker won’t be able to communicate its leadership status correctly.
-
The next error you’ll likely encounter if you fix NotLeaderForPartitionException but haven’t addressed the root cause is a LeaderNotAvailableException, which is a more general form of the same problem indicating that no suitable leader could be elected for the partition.