Kafka brokers are choking on oversized messages, failing to deliver them to consumers because the default configuration is too restrictive for your needs.
The core issue is a mismatch between the maximum message size Kafka is willing to accept and the size of the messages your producers are attempting to send. This isn’t just about disk space; it’s a fundamental limit enforced by the broker to prevent resource exhaustion and maintain stability.
Here’s how to diagnose and fix it, starting with the most common culprits:
1. message.max.bytes on the Broker
This is the absolute hard limit on the Kafka broker side. If a producer sends a message larger than this, the broker will reject it outright.
-
Diagnosis: Check the broker’s
server.propertiesfile or its effective value via JMX.# Example: Connect to a broker and query its configuration kafka-configs.sh --bootstrap-server your_broker_host:9092 --describe --entity-type brokers --entity-name 0Look for
message.max.bytes. The default is often 1MB (1048576 bytes). -
Fix: Increase
message.max.bytesinserver.propertieson all your brokers.message.max.bytes=5242880 # Set to 5MBRestart your brokers for the change to take effect.
-
Why it works: This directly tells the broker to allow larger incoming messages up to the specified limit.
2. replica.fetch.max.bytes on the Broker
This setting controls the maximum size of a single fetch request from a replica. While seemingly related to replication, it also indirectly limits the size of messages a broker will propagate or serve from its logs to consumers. If a message is larger than this, a consumer fetching data might not be able to retrieve it in one go, leading to timeouts or failures.
-
Diagnosis: Check
server.propertiesor JMX forreplica.fetch.max.bytes. The default is often 1MB.kafka-configs.sh --bootstrap-server your_broker_host:9092 --describe --entity-type brokers --entity-name 0 -
Fix: Increase
replica.fetch.max.bytesinserver.propertieson all your brokers.replica.fetch.max.bytes=5242880 # Set to 5MBRestart your brokers.
-
Why it works: This allows brokers to fetch and serve larger chunks of data, accommodating bigger messages during replication and consumption.
3. max.request.size on the Broker
This setting limits the maximum size of any request sent by a client (producer or consumer) to the broker. This includes Produce requests, which contain the messages. If your producer is sending messages in batches, the total size of the batch cannot exceed this value.
-
Diagnosis: Check
server.propertiesor JMX formax.request.size. The default is often 1MB.kafka-configs.sh --bootstrap-server your_broker_host:9092 --describe --entity-type brokers --entity-name 0 -
Fix: Increase
max.request.sizeinserver.propertieson all your brokers.max.request.size=5242880 # Set to 5MBRestart your brokers.
-
Why it works: This permits the broker to accept larger
Producerequests from producers, enabling them to send larger individual messages or larger batches of smaller messages.
4. max.partition.fetch.bytes on the Broker
This setting limits the maximum amount of data per partition that a consumer can fetch in a single Fetch request. If your messages are large, a single message might exceed this limit, causing fetch requests to fail or time out.
-
Diagnosis: Check
server.propertiesor JMX formax.partition.fetch.bytes. The default is often 1MB.kafka-configs.sh --bootstrap-server your_broker_host:9092 --describe --entity-type brokers --entity-name 0 -
Fix: Increase
max.partition.fetch.bytesinserver.propertieson all your brokers.max.partition.fetch.bytes=5242880 # Set to 5MBRestart your brokers.
-
Why it works: This allows consumers to pull larger individual messages or larger chunks of data per partition in a single request.
5. Producer max.request.size
Even if the brokers are configured to accept large messages, your producer client needs to be configured to send them. The producer’s max.request.size setting limits the size of a single batch of records that the producer will attempt to send in one request.
-
Diagnosis: Examine your producer’s configuration properties.
Properties props = new Properties(); props.put("bootstrap.servers", "your_broker_host:9092"); props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer"); props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer"); props.put("max.request.size", "1048576"); // Default is 1MB -
Fix: Increase the
max.request.sizeproperty in your producer’s configuration.props.put("max.request.size", "5242880"); // Set to 5MBThis change is applied when you instantiate your
KafkaProducer. -
Why it works: This tells the producer client to construct and send batches of records that can be up to this size, aligning with the broker’s acceptance limits.
6. Consumer fetch.max.bytes
This consumer-side setting limits the maximum amount of data per fetch request for the entire consumer group. If your messages are large, a single message could exceed this limit, causing the consumer to fail to fetch data for that partition.
-
Diagnosis: Examine your consumer’s configuration properties.
Properties props = new Properties(); props.put("bootstrap.servers", "your_broker_host:9092"); props.put("group.id", "my_group"); props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer"); props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer"); props.put("fetch.max.bytes", "1048576"); // Default is 1MB -
Fix: Increase the
fetch.max.bytesproperty in your consumer’s configuration.props.put("fetch.max.bytes", "5242880"); // Set to 5MBThis change is applied when you instantiate your
KafkaConsumer. -
Why it works: This allows the consumer client to request and receive larger chunks of data from the broker, accommodating larger messages.
Important Note on Consistency
It’s crucial that the producer’s max.request.size and the broker’s message.max.bytes and max.request.size are all set to be at least as large as the maximum individual message size you intend to send. Similarly, broker replica.fetch.max.bytes and max.partition.fetch.bytes, along with consumer fetch.max.bytes, should be large enough to accommodate these messages. If any single component in the chain is smaller than the message size, you’ll still have problems.
After fixing these, you might encounter issues with request.timeout.ms if your large messages are also causing operations to take longer than the default timeout.