A sequence mismatch error in NATS means a consumer has received messages out of the order they were published, which breaks its ability to process them correctly.

This usually happens when a NATS JetStream consumer falls behind and starts processing messages from a previous snapshot or a different stream, leading to duplicate or out-of-order messages.

Common Causes and Fixes

  1. Consumer Lag and Replay: The most frequent culprit is a consumer that has fallen significantly behind its stream. When the consumer eventually catches up or restarts, JetStream might provide it with messages from a point in time that doesn’t align with its last processed sequence number, often due to internal state reconciliation or a consumer restart during a stream snapshot.

    • Diagnosis: Check consumer status for lag.

      nats consumer info <stream_name> <consumer_name>
      

      Look for state.messages_acked vs. state.messages_delivered and state.next_seq. If next_seq is much lower than the expected sequence or if messages_delivered is significantly higher than messages_acked without active delivery, it’s a sign.

    • Fix: Purge and recreate the consumer. This forces it to start from the latest available message in the stream, aligning its sequence.

      nats consumer purge <stream_name> <consumer_name>
      nats consumer add <stream_name> <consumer_name> --filter <filter_subject> --deliver-policy all --ack-policy explicit --max-deliver 10
      

      (Adjust --filter, --deliver-policy, --ack-policy, --max-deliver as per your application’s needs. all ensures it starts from the beginning of the stream if needed, or last to start from the most recent. explicit ack policy is crucial for reliable processing.)

    • Why it works: Purging and recreating the consumer resets its internal state to a known good point, typically the current end of the stream (or the beginning if deliver-policy all is used), ensuring it doesn’t try to process messages based on a stale sequence number.

  2. Stream Snapshotting During Consumer Operation: If a stream undergoes snapshotting (e.g., due to memory pressure or explicit configuration) while a consumer is actively processing messages, the consumer’s sequence tracking can become desynchronized with the new snapshot’s sequence numbering.

    • Diagnosis: Examine JetStream server logs for snapshot-related messages and check the stream’s configuration for snapshot settings. nats server log --filter "snapshot" can be helpful. Also, observe consumer state.next_seq in nats consumer info.

    • Fix: Temporarily disable snapshotting or increase snapshot intervals if possible, then restart the consumer. After ensuring the consumer is caught up, re-enable snapshotting.

      # Temporarily disable snapshotting (example for memory snapshot)
      # This requires editing the NATS server config file and restarting the server.
      # Look for settings like 'max_memory' and 'max_file_size' for streams.
      # A more robust fix might be to increase intervals.
      # Example: In stream config, set snapshot interval to a very large number.
      # "snapshot": { "interval": "1h" } -> "snapshot": { "interval": "720h" }
      
      # Restart the NATS server if config was changed.
      # Then, purge and recreate the consumer as in Cause 1.
      nats consumer purge <stream_name> <consumer_name>
      nats consumer add <stream_name> <consumer_name> --filter <filter_subject> --deliver-policy all --ack-policy explicit --max-deliver 10
      
    • Why it works: By adjusting snapshotting, you prevent the stream’s internal sequence state from being altered in a way that conflicts with the consumer’s current progress during a critical processing window. Recreating the consumer then aligns it with the stream’s new, consistent state.

  3. Multiple Consumers on the Same Stream/Filter: If you have multiple consumers reading from the same stream with overlapping filter subjects, and one consumer is significantly slower than others, it can lead to its state becoming stale relative to the others, and potentially to sequence mismatches when it tries to catch up.

    • Diagnosis: Compare the state.next_seq across all consumers for the same stream. Identify the slowest consumer.

      nats consumer info <stream_name> <consumer_name_1>
      nats consumer info <stream_name> <consumer_name_2>
      # ... and so on for all consumers
      
    • Fix: Ensure all consumers are configured with appropriate max_deliver and ack_policy. If lag is persistent, consider scaling up the processing power for the slow consumer or adjusting its configuration to handle messages more efficiently. If strict ordering across all messages is critical and not just within a single consumer’s partition, you might need a different architectural pattern or ensure only one consumer processes a specific subset of messages.

      # Example: Increase max_deliver and ensure explicit ack
      nats consumer update <stream_name> <slow_consumer_name> --max-deliver 50 --ack-policy explicit
      
    • Why it works: While this doesn’t directly fix a sequence mismatch, it addresses the root cause of a consumer falling behind. By improving its ability to process messages or by ensuring it can retry more times before being deemed "dead," you prevent it from reaching a state where JetStream might try to reset its sequence incorrectly.

  4. Consumer Restart with deliver-policy other than all or last: If a consumer is restarted and its deliver-policy is set to new or start_sequence (with an older sequence), and the stream has since advanced significantly, it can lead to a mismatch. The new policy specifically skips messages already acknowledged by other consumers in the same consumer group, which can indirectly cause issues if not carefully managed.

    • Diagnosis: Check the consumer’s deliver_policy in its configuration.

      nats consumer config <stream_name> <consumer_name>
      

      Look for "deliver_policy": "new" or "deliver_policy": "start_sequence".

    • Fix: Recreate the consumer with deliver-policy set to all or last to ensure it starts from a known, current state of the stream.

      nats consumer purge <stream_name> <consumer_name>
      nats consumer add <stream_name> <consumer_name> --filter <filter_subject> --deliver-policy last --ack-policy explicit --max-deliver 10
      
    • Why it works: deliver-policy last (or all) ensures the consumer’s next_seq is initialized to the current end of the stream, avoiding a scenario where it’s instructed to fetch messages from a point in the past that no longer aligns with the stream’s active sequence.

  5. Network Partitions or Server Restarts During Message Acknowledgment: If a consumer acknowledges a message, but that acknowledgment is lost due to a network partition or a NATS server restart before JetStream fully commits it, the consumer might later re-deliver that message. If the consumer then tries to process it again and the stream has advanced, a sequence mismatch can occur if the consumer’s state is now ahead of where JetStream thinks it should be.

    • Diagnosis: This is hard to diagnose directly. Look for frequent consumer restarts, high num_redelivered in nats consumer info, and check NATS server logs for leader elections or network disruptions during periods of high traffic.

    • Fix: Ensure your application’s acknowledgment logic is robust. Using ack-policy explicit is a must. For extreme resilience, consider implementing idempotent processing in your consumer so that re-processing a message has no adverse effects. If the issue persists, purging and recreating the consumer as in Cause 1 is the most direct way to reset its state.

    • Why it works: Idempotency ensures that even if a message is re-delivered and re-processed due to a lost ack, the outcome is the same as if it were processed only once, preventing state corruption. Purging and recreating the consumer resets its sequence to the stream’s current state.

  6. Stream Configuration Changes (e.g., max_age, max_msgs): If stream limits are hit and messages are expired or removed (e.g., due to max_age or max_msgs), and a consumer is lagging, it might try to fetch a message that has been purged from the stream. JetStream might then try to reconcile this by advancing the consumer’s sequence, leading to a mismatch.

    • Diagnosis: Check the stream’s configuration for aggressive retention limits (max_age, max_msgs) and compare them with the consumer’s lag.

      nats stream info <stream_name>
      nats consumer info <stream_name> <consumer_name>
      

      Look at config.max_age and config.max_msgs for the stream, and state.messages_delivered vs. state.messages_acked for the consumer.

    • Fix: Increase retention limits on the stream if message retention is critical. If messages are expected to be purged, ensure your consumer is designed to handle missing messages or to stop processing when it detects such a gap. Purging and recreating the consumer will align it with the current state of the stream after purges.

      # Example: Increase max_age to 7 days
      nats stream update <stream_name> --max-age 7d
      # Then purge and recreate the consumer
      nats consumer purge <stream_name> <consumer_name>
      nats consumer add <stream_name> <consumer_name> --filter <filter_subject> --deliver-policy last --ack-policy explicit --max-deliver 10
      
    • Why it works: Adjusting retention limits ensures that messages the consumer might need are still available. Purging and recreating the consumer then aligns it with the stream’s current, valid sequence of messages.

The next error you’ll likely encounter if you’ve fixed sequence mismatches but haven’t addressed the underlying cause of consumer lag is a consumer is stalled error, indicating the consumer is still not keeping up with the stream’s message rate.

Want structured learning?

Take the full Nats course →