The most surprising thing about idempotent message processing is that it’s not about preventing duplicate messages; it’s about making your system behave as if duplicates don’t exist, even when they do.

Let’s watch this in action. Imagine a simple e-commerce scenario: a PlaceOrder event arrives.

{
  "eventType": "PlaceOrder",
  "eventId": "a1b2c3d4-e5f6-7890-1234-567890abcdef",
  "orderId": "ORD-98765",
  "customerId": "CUST-123",
  "items": [
    {"productId": "PROD-A", "quantity": 2},
    {"productId": "PROD-B", "quantity": 1}
  ],
  "timestamp": "2023-10-27T10:00:00Z"
}

A consumer receives this. Without idempotency, if this message is delivered twice, two identical orders might be created, leading to overstocking, customer confusion, and a potential mess for the fulfillment team.

Here’s how we make it idempotent. The key is a unique identifier for each event – the eventId in our example. When a consumer processes an event, it first checks if it has already processed an event with that eventId.

A common pattern is to use a database table, let’s call it processed_events, with a unique index on event_id.

CREATE TABLE processed_events (
    event_id VARCHAR(36) PRIMARY KEY,
    processed_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

When the PlaceOrder consumer receives the message:

  1. Attempt to insert eventId into processed_events:
    INSERT INTO processed_events (event_id) VALUES ('a1b2c3d4-e5f6-7890-1234-567890abcdef');
    
  2. Handle the outcome:
    • Success: The eventId was inserted. This is the first time we’ve seen this event. Proceed with the business logic (create the order, update inventory, send confirmation, etc.).
    • Duplicate Key Error: The eventId already exists. This means we’ve processed this event before. Silently discard the message and return success to the message broker. No business logic is executed again.

This simple check ensures that even if the message broker redelivers the same message multiple times (e.g., due to network issues or consumer crashes), the business operation (PlaceOrder) is performed only once.

The mental model you need is that the message broker is unreliable, but your consumers must be reliable by being idempotent. You can’t guarantee delivery exactly once, but you can guarantee that the effect of a message is applied exactly once. This shifts the burden of "exactly once" processing from the transport layer to your application logic.

Idempotency is crucial for distributed systems that rely on asynchronous communication, like those using Kafka, RabbitMQ, or AWS SQS. These systems often guarantee "at least once" delivery, meaning duplicates are a possibility. Without idempotency, "at least once" delivery effectively becomes "at least once" execution of your business logic, which is often unacceptable.

Consider the different types of operations:

  • Idempotent operations: Operations that can be applied multiple times without changing the result beyond the initial application. Examples: setting a value, creating a resource with a unique ID, marking an item as "shipped."
  • Non-idempotent operations: Operations whose outcome changes with each execution. Examples: incrementing a counter, deducting money from an account, appending to a log.

If you have non-idempotent operations that must be executed, you need to wrap them in an idempotent pattern. For instance, to achieve an idempotent "deduct $10 from account X," you might first check if a transaction with the given eventId already exists to deduct $10. If it does, do nothing. If it doesn’t, perform the deduction and record the eventId of that deduction.

The eventId isn’t the only way to achieve idempotency. You can also use a combination of business keys. For example, if you’re processing a PaymentReceived event, you might use orderId and paymentAmount as part of a composite key to check if this specific payment for this order has already been recorded. However, relying solely on business keys can be tricky if the business data itself can change in ways that would make a previous "successful" operation now seem incorrect (e.g., an order amount changing after a payment was processed). A globally unique eventId is generally the most robust approach.

The real power comes when you consider the state management. The processed_events table is just one way. You could also store the idempotency token directly within the entity being modified. For example, when processing a PlaceOrder event, you could add an originalEventId field to your orders table. If originalEventId is already populated, you know it’s a duplicate. This co-locates the idempotency check with the actual business data, which can be more efficient but requires careful schema design and transaction management.

The next challenge you’ll face is managing the lifecycle of these idempotency records; they can grow indefinitely and consume significant storage.

Want structured learning?

Take the full Http course →