Lambda functions don’t just run code; they run it inside a specific amount of memory, and that memory allocation is the single most important lever for tuning performance and cost.

Let’s see this in action. Imagine we have a function that processes images. Without tuning, it might be set to 128MB:

# lambda_function.py
import time
import boto3

def lambda_handler(event, context):
    start_time = time.time()
    s3 = boto3.client('s3')

    # Simulate image processing: download, resize, upload
    bucket_name = event['bucket']
    image_key = event['key']
    new_key = "processed/" + image_key.split('/')[-1]

    try:
        response = s3.get_object(Bucket=bucket_name, Key=image_key)
        image_data = response['Body'].read()

        # Simulate a CPU-bound task (e.g., resizing)
        # In a real scenario, this would involve libraries like Pillow
        processed_data = image_data * 2 # Very naive simulation

        s3.put_object(Bucket=bucket_name, Key=new_key, Body=processed_data)

        end_time = time.time()
        duration = end_time - start_time
        memory_used = context.memory_limit_in_mb # This is what we're tuning

        return {
            'statusCode': 200,
            'body': f"Image {image_key} processed successfully. Duration: {duration:.2f}s, Memory: {memory_used}MB"
        }
    except Exception as e:
        return {
            'statusCode': 500,
            'body': f"Error processing {image_key}: {e}"
        }

If we invoke this with {"bucket": "my-image-bucket", "key": "raw/large_image.jpg"}, and it’s configured with 128MB, it might take 15 seconds. Now, let’s say we increase the memory to 512MB. The same function, with the exact same code, might now complete in 5 seconds. Why? Because Lambda provisions CPU power proportionally to memory. More memory means more CPU, leading to faster execution for CPU-bound tasks.

The problem we’re solving is finding the sweet spot: enough memory for fast execution, but not so much that we’re overpaying for unused resources. Lambda pricing is based on memory * duration. If doubling the memory halves the duration, the memory * duration product might stay the same, or even decrease if the CPU gain is disproportionately large.

Here’s how Lambda allocates resources: for every 1MB of memory allocated, Lambda provides 2 vCPU. So, a 128MB function gets 256 vCPU, and a 1024MB function gets 2048 vCPU. This isn’t a linear relationship for all workloads. For I/O-bound tasks (like waiting for network requests or disk reads), the CPU isn’t the bottleneck, and increasing memory might not significantly improve performance, but it will increase cost.

The internal mechanism is that Lambda runs your code in a Firecracker microVM. The memory allocated directly dictates the resources assigned to that VM. When you increase memory, Lambda provides more RAM and, critically, more processing power (vCPUs) to that isolated environment. This is why compute-bound tasks see dramatic improvements.

To find the optimal configuration, you need to experiment. The AWS Lambda console provides execution logs that show duration and memory usage. A common strategy is to start with a baseline (e.g., 128MB or 256MB) and then double the memory in increments (256MB, 512MB, 1024MB, 2048MB, etc.). For each increment, measure the execution duration and the actual memory consumed.

You can use AWS X-Ray to trace requests and get detailed timing information for different parts of your function. For a more programmatic approach, you can instrument your code to log the context.memory_limit_in_mb and context.get_remaining_execution_time_in_millis().

The most surprising thing is how often the optimal memory configuration isn’t a power of two. While you might test 128, 256, 512, 1024, and 2048, you might find that 1792MB gives you the best balance of speed and cost for a particular workload, offering a slight performance edge over 1024MB without the significant price jump to 2048MB. This is because the underlying infrastructure might have efficiencies at specific memory boundaries, or your application’s resource needs might just happen to hit a sweet spot between standard allocations.

Once you’ve identified a few promising memory settings, you’ll want to calculate the cost per invocation. For example, if a function runs for 10 seconds at 512MB, the cost is roughly 512MB * 10s * $0.0000166667 / (1000MB-s) (AWS Lambda pricing per GB-second, adjust for region). If doubling memory to 1024MB reduces duration to 4 seconds, the cost is 1024MB * 4s * $0.0000166667 / (1000MB-s). You’re looking for the lowest memory * duration product that meets your latency requirements.

The next hurdle after optimizing memory is understanding how function cold starts impact your overall latency.

Want structured learning?

Take the full Lambda course →