DynamoDB Streams are a way to capture every change to your DynamoDB table and send those changes as a sequence of events to a stream. Lambda functions can then read these events and react to them in near real-time.

Let’s see it in action. Imagine we have a Products table in DynamoDB with a product_id (string, partition key) and price (number). We want to update a separate PriceHistory table whenever a product’s price changes.

First, enable DynamoDB Streams on your Products table. You can do this in the AWS console under the table’s "Exports and streams" tab, or with the AWS CLI:

aws dynamodb update-table --table-name Products --stream-specification StreamViewType=NEW_AND_OLD_IMAGES

NEW_AND_OLD_IMAGES means our Lambda will receive both the state of the item before and after the change, which is useful for detecting specific modifications like price updates.

Now, create a Lambda function. Here’s a Python example:

import json
import boto3

dynamodb = boto3.resource('dynamodb')
price_history_table = dynamodb.Table('PriceHistory')

def lambda_handler(event, context):
    for record in event['Records']:
        # Check if the record is an update and if the price changed
        if record['eventName'] == 'MODIFY':
            old_image = record['dynamodb']['OldImage']
            new_image = record['dynamodb']['NewImage']

            if 'price' in old_image and 'price' in new_image and old_image['price'] != new_image['price']:
                product_id = new_image['product_id']
                new_price = new_image['price']
                old_price = old_image['price']

                price_history_table.put_item(
                    Item={
                        'product_id': product_id,
                        'timestamp': int(record['dynamodb']['ApproximateCreationDateTime']), # Unix timestamp
                        'old_price': old_price,
                        'new_price': new_price
                    }
                )
                print(f"Price updated for {product_id}: {old_price} -> {new_price}")
    return {
        'statusCode': 200,
        'body': json.dumps('Successfully processed stream records')
    }

To trigger this Lambda function from the DynamoDB Stream, you need to create an event source mapping. In the AWS console, go to your Lambda function, click "Add trigger," select "DynamoDB," choose your Products table, and ensure "Batch size" is set to 100 (the default) and "Starting position" is LATEST. Or via CLI:

aws lambda create-event-source-mapping \
    --function-name your-lambda-function-name \
    --event-source-arn arn:aws:dynamodb:your-region:your-account-id:table/Products/stream/your-stream-id \
    --batch-size 100 \
    --starting-position LATEST

The event-source-arn can be found in your DynamoDB table’s stream details.

Now, when you update a product’s price in the Products table, DynamoDB will write an event to its stream. The Lambda function will be invoked, process the event, and if the price has changed, it will write a new item to the PriceHistory table.

The most surprising thing about DynamoDB Streams is that they are not a message queue, even though they behave similarly. Unlike traditional queues where messages are deleted after being processed, DynamoDB Stream records have a limited lifespan (typically 24 hours) and are processed by reading shards from the stream. If your Lambda function fails to process a batch of records, AWS will automatically retry processing that same batch until it succeeds or the records expire. This makes them ideal for event-driven architectures where you need to react to data changes reliably.

You control the behavior of the stream and Lambda integration through the StreamViewType when enabling streams, the eventName and dynamodb fields within the event payload, and the configuration of the event source mapping, particularly BatchSize and StartingPosition. The StartingPosition is crucial: LATEST means you only process new records from the moment the stream is enabled, while TRIM_HORIZON means you process all existing records in the stream, which can be useful for backfilling data or recovering from failures.

The one thing most people don’t realize is that the ApproximateCreationDateTime in the dynamodb object is a Unix epoch timestamp in milliseconds, and it’s not guaranteed to be monotonically increasing across different shards of the stream. This means you can’t simply sort stream records by this timestamp to reconstruct a perfect chronological order of operations if your table has high write throughput across multiple partition keys.

The next thing you’ll want to understand is how to handle duplicate events, as DynamoDB Streams can occasionally deliver the same record multiple times due to the at-least-once delivery guarantee of the underlying infrastructure.

Want structured learning?

Take the full Lambda course →