The Kafka producer is blocking because its internal buffer is full, and it can’t send any more records until space frees up.

This is almost always caused by a mismatch between how fast records are being produced and how fast they are being acknowledged by the Kafka brokers. The producer has a buffer.memory setting, a hard limit on the total memory the producer will use to buffer records that haven’t been sent to the broker yet. When this buffer fills up, send() calls will block, waiting for space to become available.

Here are the most common reasons and how to fix them:

1. Broker Throughput Limit Exceeded

The most frequent culprit is that the Kafka brokers simply can’t keep up with the rate at which your producer is trying to send data. This could be due to network saturation on the broker side, disk I/O bottlenecks on the broker, or insufficient broker CPU.

Diagnosis: On the Kafka broker, monitor disk I/O (iostat -xz 1), network traffic (iftop -i eth0), and CPU usage (top or htop). Look for sustained high utilization (e.g., disk %util at 100%, high network bandwidth, or CPU over 80%). Also, check broker logs for messages indicating slow disk writes or network issues.

Fix:

  • Scale up brokers: Increase the number of brokers in your cluster or upgrade their hardware (faster CPUs, more RAM, faster disks like NVMe SSDs).
  • Increase partitions: If your topic has too few partitions, even a single broker might be overwhelmed. Add more partitions to the topic. For example, to add 10 partitions to a topic named my-topic:
    kafka-topics.sh --bootstrap-server broker1:9092 --alter --topic my-topic --partitions 20
    
    This allows Kafka to distribute the load across more brokers and more disk threads.
  • Optimize producer batch.size and linger.ms: If you’re sending many small messages, try increasing batch.size to allow more records to be batched together before sending, reducing network overhead. Similarly, increasing linger.ms (e.g., to 100 ms) gives the producer more time to accumulate records into a batch, improving throughput.

Why it works: This addresses the fundamental bottleneck by either increasing the capacity of the brokers to ingest data or by making the producer send data more efficiently.

2. Network Latency or Bandwidth Issues

High network latency between the producer and the brokers, or insufficient network bandwidth, can cripple the producer’s ability to send data quickly.

Diagnosis: Use ping and traceroute from the producer machine to the broker to check latency and packet loss. On the producer machine, monitor its network interface’s outgoing bandwidth usage.

Fix:

  • Improve network infrastructure: Ensure sufficient bandwidth between producer and broker networks. This might involve upgrading network cards, switches, or network links.
  • Reduce network hops: If possible, colocate producers and brokers on the same high-speed network.
  • Tune TCP settings: On the producer’s OS, tune TCP buffer sizes and other network parameters. This is OS-specific but can sometimes help.

Why it works: This ensures that data can physically travel from the producer to the broker at a sufficient speed, removing a physical transport bottleneck.

3. Producer buffer.memory Too Small

The buffer.memory setting might simply be too small for the producer’s intended throughput, even if brokers are healthy.

Diagnosis: Check the producer’s buffer.memory configuration. If it’s very low (e.g., 10MB or 32MB) and your producer is attempting to send data at a high rate, this is a likely cause.

Fix: Increase buffer.memory. The optimal value depends on your expected throughput and message size. A common starting point is 32MB or 64MB, but for high-throughput scenarios, 128MB, 256MB, or even 512MB might be necessary.

# Example producer.properties
buffer.memory=134217728 # 128MB

Why it works: A larger buffer allows the producer to queue up more records before blocking, giving brokers more time to acknowledge them and freeing up space.

4. Producer acks Setting Too High

The acks setting controls how many brokers must acknowledge a record before the producer considers it successful. Setting acks=all (or -1) is the safest but slowest. If brokers are struggling to respond quickly, this setting can cause the producer to block.

Diagnosis: Check the producer’s acks configuration. If it’s set to all and you’re experiencing blocking, this is a strong indicator.

Fix:

  • Lower acks: Change acks to 1. This means the leader broker must acknowledge the record. This is a good balance between durability and performance.
  • Consider acks=0: If you can tolerate losing a small amount of data (e.g., during broker failures), setting acks=0 makes the producer non-blocking and very fast, as it doesn’t wait for any acknowledgment.

Why it works: Reducing the number of required acknowledgments means the producer receives confirmation of success faster, allowing it to clear its buffer more quickly.

5. High Latency in RecordAccumulator Processing

The RecordAccumulator is the internal component managing the producer’s buffer. If its internal processing is slow, it can’t efficiently move records from the network buffer to the send queue. This is less common but can happen with complex interceptors or custom serializers.

Diagnosis: This is harder to diagnose directly. If you’ve ruled out network and broker issues, and buffer.memory is sufficient, look for slow custom code in your producer’s interceptors or serializers. You might need to profile the producer application.

Fix:

  • Optimize custom code: If custom serializers or interceptors are used, profile them for performance bottlenecks.
  • Reduce max.request.size: While counterintuitive, a very large max.request.size can sometimes lead to larger batches that take longer to serialize or transfer, indirectly impacting RecordAccumulator’s efficiency. Try reducing it if you have extremely large messages.

Why it works: Ensures that the internal machinery of the producer can efficiently manage its buffers and prepare records for sending.

6. Insufficient Producer Threads / Blocking Producer Code

If your application code is blocking the thread that calls producer.send(), it can appear as if the producer buffer is full, even if it’s not. This is especially true if you’re using a synchronous send() without handling the Future correctly or if your application is experiencing general thread contention.

Diagnosis: Profile your producer application. Look for threads that are stuck in producer.send() or future.get(). Check for general thread exhaustion in your application.

Fix:

  • Use asynchronous send(): Always use the asynchronous producer.send(record, callback) and implement a robust callback to handle errors and acknowledgments. Avoid future.get().
  • Increase producer threads: If you have a very high-throughput producer, consider using a thread pool for producing records, ensuring enough threads are available to keep send() calls non-blocking.

Why it works: Prevents your application’s own thread management from becoming the bottleneck, allowing the producer client to operate efficiently.

After fixing these, the next error you’ll likely encounter is a TimeoutException if your producers are still configured to wait for acknowledgments that are taking too long, or potentially an OutOfMemoryError if you’ve increased buffer.memory too aggressively without addressing the underlying throughput issues.

Want structured learning?

Take the full Kafka course →