Kafka Batch Size and linger.ms: Maximize Throughput (2026)

Kafka producers don’t just send messages one by one; they batch them up to reduce network overhead. The batch.size and linger.ms settings are your primary levers for tuning this batching behavior to maximize throughput.

Let’s see this in action. Imagine a producer sending 1000 messages per second to a topic.

// Producer configuration
Properties props = new Properties();
props.put("bootstrap.servers", "localhost:9092");
props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");

// Tuning parameters
props.put("batch.size", 16384); // 16 KB
props.put("linger.ms", 5);     // 5 milliseconds

KafkaProducer<String, String> producer = new KafkaProducer<>(props);

for (int i = 0; i < 100000; i++) {
    String message = "Message-" + i;
    producer.send(new ProducerRecord<>("my-topic", message));
}

producer.close();

In this snippet, batch.size is set to 16KB, and linger.ms to 5ms. This means the producer will wait up to 5ms for more messages to arrive or until it has accumulated 16KB of data, whichever comes first, before sending a batch to the Kafka broker.

The core problem this batching solves is the overhead of network requests. Sending each message individually would involve establishing a connection, sending headers, and waiting for acknowledgments for every single message. This is incredibly inefficient, especially for high-volume producers. By grouping messages into batches, the producer amortizes this overhead across many messages, significantly increasing the number of messages that can be sent per second.

Internally, the producer maintains a buffer for each topic-partition it’s sending to. When send() is called, the message is appended to the appropriate buffer. The producer’s background thread then watches these buffers. If a buffer reaches batch.size or if linger.ms has elapsed since the first message was added to the buffer, the thread packages the messages in that buffer into a single request and sends it to the broker. Acknowledgments from the broker are then used to determine if the messages were successfully sent, allowing the producer to retry or drop them based on its retries configuration.

The batch.size directly controls the maximum amount of data sent in a single request. A larger batch.size can lead to fewer, larger requests, which can be more efficient if your network is not saturated and your brokers can handle large requests. However, if a batch becomes too large, it might exceed the broker’s message.max.bytes or replica.fetch.max.bytes configuration, leading to errors. It also increases the latency for individual messages within that batch, as they have to wait for the batch to fill or linger.ms to expire.

linger.ms introduces a delay. A value of 0 means the producer will try to send messages as soon as they are ready (though they will still be batched up to batch.size). Increasing linger.ms allows the producer to wait longer for more messages, potentially creating larger batches and thus higher throughput, but at the cost of increased end-to-end latency for messages within that batch. If your application needs low latency, you’ll want a small linger.ms. If throughput is king and latency is secondary, you can experiment with higher values.

Finding the sweet spot involves balancing throughput and latency. For maximum throughput, you generally want to send the largest possible batches without causing other bottlenecks. This often means increasing batch.size and linger.ms. However, you must consider your network bandwidth, broker capacity, and the latency tolerance of your application. A common starting point for high-throughput scenarios is batch.size=1MB (1048576 bytes) and linger.ms=100ms. You’d then monitor producer metrics like record-send-rate and batch-size-avg to see if further tuning is beneficial. If your batch-size-avg is consistently much lower than your batch.size, you might be able to increase linger.ms to fill batches more effectively. Conversely, if batch-size-avg is hitting your batch.size limit frequently, you might need a larger batch.size or a more robust network.

The interaction between batch.size and linger.ms is crucial. If linger.ms is set very high, the producer might wait for that duration even if the batch.size is reached much earlier. Conversely, if batch.size is reached quickly, the producer will send the batch immediately, ignoring the remaining linger.ms duration. The producer will send a batch when either condition is met.

The producer’s internal request queue is a key component; if this queue fills up, it signifies that the producer is sending data faster than the brokers can consume it, or that network congestion is occurring. Monitoring request-latency-avg and record-error-rate in producer metrics is vital. If request-latency-avg is high and record-error-rate is increasing, it suggests that your batch sizes or linger times might be too aggressive for your current Kafka cluster or network.

The next logical step after optimizing batching for throughput is to consider message idempotence and transactional guarantees, which introduce their own complexities and impact on performance.