Lambda functions, by design, are meant to be executed at least once, which means you can’t just assume your function will run exactly one time.

Let’s say you have a Lambda function triggered by an SQS queue. When a message arrives, Lambda processes it. If the processing fails midway, Lambda might retry, leading to duplicate processing of the same message. This can cause all sorts of chaos, like charging a customer twice or sending out duplicate notifications.

// Example SQS message payload
{
  "MessageId": "a1b2c3d4-e5f6-7890-1234-abcdef123456",
  "ReceiptHandle": "AQEB...",
  "Body": "{\"orderId\": \"ORD12345\", \"customerId\": \"CUST987\"}"
}

Here’s how you might process this in Python:

import json
import boto3

dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('OrderProcessingStatus') # DynamoDB table to track processed orders

def lambda_handler(event, context):
    for record in event['Records']:
        message_body = json.loads(record['body'])
        order_id = message_body['orderId']
        customer_id = message_body['customerId']

        # Check if this order has already been processed
        response = table.get_item(
            Key={'orderId': order_id}
        )

        if 'Item' in response:
            print(f"Order {order_id} already processed. Skipping.")
            # If already processed, delete the message from SQS to prevent further retries
            sqs = boto3.client('sqs')
            sqs.delete_message(
                QueueUrl='YOUR_SQS_QUEUE_URL',
                ReceiptHandle=record['receiptHandle']
            )
            continue # Move to the next message

        # --- Simulate order processing ---
        print(f"Processing order {order_id} for customer {customer_id}...")
        # In a real scenario, this would involve database updates, API calls, etc.
        # For demonstration, we'll just simulate success
        processing_successful = True
        # ---------------------------------

        if processing_successful:
            print(f"Successfully processed order {order_id}.")
            # Mark the order as processed in DynamoDB
            table.put_item(
                Item={
                    'orderId': order_id,
                    'customerId': customer_id,
                    'status': 'PROCESSED',
                    'processedTimestamp': context.get_remaining_time_in_millis() # Example timestamp
                }
            )
            # Delete the message from SQS
            sqs = boto3.client('sqs')
            sqs.delete_message(
                QueueUrl='YOUR_SQS_QUEUE_URL',
                ReceiptHandle=record['receiptHandle']
            )
        else:
            print(f"Failed to process order {order_id}. Message will be reprocessed.")
            # Do NOT delete the message. SQS will re-deliver it after visibility timeout.
            # Or, you could send it to a Dead Letter Queue (DLQ) after several failed attempts.

The core idea is to have a unique identifier for each operation that you want to make idempotent. For order processing, orderId is a good candidate. You then use a persistent store (like DynamoDB) to record whether an operation associated with that identifier has already been completed.

When your Lambda function receives an event, it extracts this unique identifier. Before performing the actual work, it queries your persistent store. If the identifier is already present, it means the operation has been done before. In this case, you simply acknowledge the event (e.g., delete the SQS message) and exit. If the identifier is not found, you proceed with the operation, and upon successful completion, you record the identifier in your persistent store.

The critical part is ensuring that the check for existence and the recording of the identifier are atomic if possible, or handled in a way that minimizes race conditions. For SQS, deleting the message after successfully recording the idempotency key is crucial. If the Lambda fails after recording the key but before deleting the message, the message will be reprocessed, but your idempotency check will correctly prevent duplicate work.

The ReceiptHandle from SQS is unique per message delivery. If your Lambda function is retried by SQS for the same message, the ReceiptHandle will be different. This is why relying only on the ReceiptHandle for idempotency isn’t sufficient if your business operation (like processing an order) needs to be idempotent. You need a business-level identifier.

What most people miss is how to handle partial failures within the Lambda execution after the idempotency check has passed but before the state is updated. If your function successfully checks for idempotency, starts processing, and then crashes, the SQS message won’t be deleted. SQS will re-deliver it. Your idempotency check will then see the state (if you managed to update it before crashing) and skip it. However, if the state update itself failed, you might reprocess. This is why ensuring your state update (e.g., put_item to DynamoDB) is robust and happens before you consider the operation "complete" for idempotency purposes is key. If the state update fails, you should treat the entire operation as failed and allow the message to be reprocessed or sent to a DLQ.

The next challenge is handling idempotency for operations that don’t have a natural business-level identifier, or when your downstream systems might also need to be idempotent.

Want structured learning?

Take the full Lambda course →