The NATS JetStream server is refusing to accept new messages because it believes the client is sending them out of order, indicating a potential data corruption or replay attack.
One of the most common reasons for this error is clock skew between the NATS server and the client application. JetStream uses internal timestamps for message ordering and deduplication, and significant differences can lead to this sequence error.
Diagnosis: Check server and client clock synchronization. On Linux, use ntpdate -q <NATS_SERVER_IP> to query the server’s time and compare it with your client’s local time.
Fix: Ensure both server and client are synchronized to a reliable NTP source. For example, on a Linux client, install and configure ntpd or chrony to sync with pool.ntp.org.
Why it works: Consistent timekeeping across all participants in the distributed system is fundamental for ordered processing and preventing replay attacks.
Another frequent culprit is a client application that is not properly managing its sequence numbers. If a client restarts and doesn’t correctly resume its last known sequence, it can appear to be sending old messages.
Diagnosis: Examine client-side logging for sequence number handling. Look for where the client stores its last sent sequence and how it reloads it after a restart.
Fix: Implement robust persistence for the client’s last acknowledged sequence number. This could involve writing it to a file or a small database. Upon restart, the client must read this value and set the expected_last_msg_seq field in its NATS connection options to this persisted value. For example, if the last sequence was 12345, the client should connect with nats:url="nats://..." expected_last_msg_seq=12345.
Why it works: This explicitly tells the NATS server what the client expects to have sent, allowing the server to reconcile its state with the client’s.
Network issues, particularly packet reordering or loss, can trick JetStream into thinking messages are out of order, even if the client sent them correctly.
Diagnosis: Use network monitoring tools like tcpdump on both the client and server to capture traffic and analyze packet order. Look for TCP retransmissions or out-of-order packet reports.
Fix: Address underlying network instability. This might involve investigating faulty network hardware, misconfigured routers, or network congestion. For a temporary workaround, you can increase the server’s tolerance for out-of-order messages, though this weakens deduplication guarantees: edit the NATS server configuration file (nats-server.conf) to include jetstream: {max_deliver: 1000, max_outstanding_acks: 1000}. Note that these are server-side settings related to delivery and acknowledgments, not direct sequence number tolerance. The primary fix is network stability.
Why it works: A stable network ensures that packets arrive at the server in the sequence they were sent, which is crucial for the server’s ordering logic.
A client library bug or an incorrect implementation of the JetStream API on the client side can also be the cause.
Diagnosis: Review the client application’s code that interacts with JetStream. Pay close attention to how PublishMsg is called, especially any manual sequence number manipulation or custom headers.
Fix: Update the NATS client library to the latest stable version. If the issue persists, consider submitting a bug report to the library’s maintainers with a reproducible example. For instance, ensure you are using the nats.go client with a recent version like v1.24.0 or later.
Why it works: Newer library versions often contain bug fixes for edge cases in API interactions, including sequence number handling.
If the NATS JetStream stream itself has been restored from a backup or has experienced data corruption, the sequence numbers might be inconsistent.
Diagnosis: Check the JetStream stream’s metadata. Use the NATS CLI: nats context select <your_context> followed by nats stream info <your_stream_name>. Look at the First Sequence and Last Sequence values.
Fix: If corruption is suspected, the safest approach is to recreate the stream. This involves stopping producers, draining the stream (if possible), deleting the stream, and then recreating it with its original configuration. nats stream rm <your_stream_name> and then nats stream add <your_stream_name> --subjects <subject_pattern> --storage <file/memory> --replicas <N>.
Why it works: A clean stream starts with a known, consistent state, eliminating any prior inconsistencies.
A distributed denial-of-service (DDoS) attack targeting the NATS server or the client could flood the system with malformed or duplicate messages, triggering sequence errors as a defense mechanism.
Diagnosis: Monitor network traffic for unusually high volumes of connections or messages from unexpected IP addresses. Check NATS server logs for repeated connection attempts or errors from specific clients. Fix: Implement network-level security measures such as firewalls, rate limiting, and IP blocking. Configure NATS server with authentication and authorization to restrict access. Consider using an NATS proxy or load balancer with DDoS mitigation capabilities. Why it works: Security measures prevent malicious traffic from overwhelming the NATS server, allowing it to process legitimate messages without triggering its integrity checks.
The next error you’re likely to encounter after resolving this is a "consumer is missing", which indicates that the consumer responsible for processing messages from the stream has been removed or is in an unhealthy state.