Kafka’s auto.offset.reset setting is a deceptively simple configuration that dictates what happens when a consumer group starts up and Kafka can’t find an existing offset for it. The surprising truth is that its default behavior, latest, actively prevents reprocessing of old messages, and changing it to earliest can lead to massive, unexpected reprocessing events.

Let’s see auto.offset.reset in action. Imagine a simple Kafka topic, user_events, with a few messages already in it.

// Message 1 (offset 0)
{ "user_id": 101, "event": "login", "timestamp": 1678886400 }

// Message 2 (offset 1)
{ "user_id": 102, "event": "view_product", "timestamp": 1678886460 }

// Message 3 (offset 2)
{ "user_id": 101, "event": "add_to_cart", "timestamp": 1678886520 }

Now, we have a consumer application, event_processor, that reads from user_events and writes processed data to a database.

Scenario 1: auto.offset.reset=latest (Default)

  1. Start event_processor for the first time. Kafka looks for an offset for the event_processor consumer group. It finds none.
  2. auto.offset.reset=latest kicks in. Kafka tells the consumer to start reading from the end of the topic – from the next message that arrives after the consumer starts.
  3. New messages arrive:
    // Message 4 (offset 3)
    { "user_id": 103, "event": "purchase", "timestamp": 1678886600 }
    
  4. event_processor reads Message 4 and processes it. It never saw Messages 1, 2, or 3.

Scenario 2: auto.offset.reset=earliest

  1. Start event_processor for the first time. Kafka looks for an offset for the event_processor consumer group. It finds none.
  2. auto.offset.reset=earliest kicks in. Kafka tells the consumer to start reading from the beginning of the topic – from offset 0.
  3. event_processor reads and processes Messages 1, 2, and 3. It then waits for new messages.
  4. New messages arrive:
    // Message 4 (offset 3)
    { "user_id": 103, "event": "purchase", "timestamp": 1678886600 }
    
  5. event_processor reads Message 4 and processes it.

This second scenario is how you achieve reprocessing. By setting auto.offset.reset=earliest in your consumer’s configuration, you instruct Kafka to begin consuming from the oldest available message in a partition if no committed offset exists for that consumer group.

The core problem auto.offset.reset solves is ensuring a consumer has a starting point. When a consumer group starts, Kafka needs to know which message to deliver first. It checks its internal state for that consumer group and partition. If it finds a committed offset (meaning the consumer group has already processed and committed its progress up to a certain point), it resumes from there. If it doesn’t find an offset, that’s where auto.offset.reset becomes critical.

Here’s the configuration snippet for a consumer:

# consumer.properties
bootstrap.servers=kafka.example.com:9092
group.id=event_processor_group
auto.offset.reset=earliest # or latest
enable.auto.commit=true # Be careful with this in production!
auto.commit.interval.ms=5000
key.deserializer=org.apache.kafka.common.serialization.StringDeserializer
value.deserializer=org.apache.kafka.common.serialization.StringDeserializer

The key levers you control are:

  • group.id: This uniquely identifies your consumer group. Kafka stores offsets per group.id per partition. Changing this effectively creates a "new" consumer group, which will then trigger the auto.offset.reset behavior.
  • auto.offset.reset: The crucial setting.
    • latest: Start from the newest messages. Good for real-time processing where you don’t care about historical data if a consumer restarts or is new.
    • earliest: Start from the oldest messages. Essential for reprocessing historical data or ensuring a new consumer processes everything.
  • enable.auto.commit: If true, Kafka automatically commits offsets periodically (controlled by auto.commit.interval.ms). This is convenient but can lead to message loss or duplicate processing if your consumer crashes after fetching but before processing and committing. Setting this to false and manually committing offsets (e.g., after successful database writes) provides stronger guarantees.

The most common reason people struggle with auto.offset.reset is misunderstanding the implications of changing group.id. If you have a deployed consumer application and you want to reprocess data, the simplest way to trigger auto.offset.reset=earliest is to deploy a new version of your consumer with a different group.id. Kafka will see this as a completely new consumer group, find no existing offset for it, and then apply the auto.offset.reset rule. The old consumer group, with its original group.id, will continue to track its own offsets independently.

The true power and danger of auto.offset.reset=earliest lies in its interaction with the consumer’s offset management. When enable.auto.commit is true, the consumer library periodically polls Kafka to commit the offsets of the messages it has fetched and processed. However, if your consumer application crashes after a commit interval but before it has fully finished processing the batch of messages associated with that commit, those messages might be reprocessed when the consumer restarts. Conversely, if it crashes after processing a message but before the next auto-commit, that single message might be lost from the perspective of guaranteed delivery, even though it was processed. This is why manual commits, where you explicitly commit an offset after a critical downstream action (like a database write) is successful, are preferred for critical applications.

After you’ve successfully reprocessed your data, the next challenge is often managing the size of your Kafka topics and ensuring you don’t accidentally trigger another reprocessing event when you least expect it.

Want structured learning?

Take the full Kafka course →