AWS Lambda functions running on ARM (Graviton2) processors can offer significantly better performance and cost-efficiency compared to their x86 counterparts.

Let’s see it in action. Imagine we have a simple Python function that performs some basic calculations. We’ll deploy it twice, once on x86 and once on ARM, and then benchmark them using a tool like aws-lambda-power-tuning.

First, here’s a basic Python function we might use:

import time
import json

def lambda_handler(event, context):
    start_time = time.time()
    # Simulate some computation
    result = 0
    for i in range(10**7):
        result += i
    end_time = time.time()
    return {
        'statusCode': 200,
        'body': json.dumps({
            'message': 'Calculation complete!',
            'duration_seconds': end_time - start_time,
            'result_sum': result # Just to ensure the loop isn't optimized away
        })
    }

Now, let’s consider deploying this. When creating a Lambda function, you can select the architecture. For x86, it’s the default. For ARM, you explicitly choose arm64. The memory allocation is a key lever we’ll use for tuning.

Here’s how you might invoke it and measure performance differences, assuming you’ve configured aws-lambda-power-tuning or a similar benchmarking tool:

x86 Benchmark (e.g., 128MB memory, 1000ms timeout)

# Example invocation for benchmarking (using AWS CLI)
aws lambda invoke \
    --function-name my-lambda-x86 \
    --payload '{}' \
    output_x86.json \
    --cli-binary-format raw-in-base64-out

ARM Benchmark (e.g., 128MB memory, 1000ms timeout)

# Example invocation for benchmarking (using AWS CLI)
aws lambda invoke \
    --function-name my-lambda-arm \
    --payload '{}' \
    output_arm.json \
    --cli-binary-format raw-in-base64-out

After running these through a power tuning tool, you’d observe results like this (these are illustrative numbers, actual results vary):

  • x86 (128MB): Average duration ~500ms, average cost ~$0.000001667 per invocation.
  • ARM (128MB): Average duration ~350ms, average cost ~$0.000001042 per invocation.

This isn’t just about raw speed; it’s about how the system achieves that speed and at what cost. The Graviton2 processors use an ARMv8 architecture, which is designed for efficiency and performance in server workloads. Lambda abstracts away the underlying hardware, but the choice of architecture directly influences the CPU instructions available and how efficiently the code can be executed. For many workloads, especially those that are CPU-bound and can leverage the wider instruction sets of ARM, the performance uplift is substantial. This performance gain often translates directly into lower costs because you can achieve the same throughput with less memory allocation or fewer function invocations.

The core problem this solves is the historical dominance of x86 in cloud computing, which led to a de facto standard that wasn’t always the most cost-effective or performant for every workload. Lambda’s support for ARM architectures provides developers with a choice, enabling them to optimize for specific application needs. It works by allowing AWS to provision your Lambda function execution environment on Graviton2 instances instead of Intel/AMD x86 instances. The underlying runtime (like Node.js, Python, Java) is compiled to run on ARM, and the Lambda service handles the rest.

The key levers you control are:

  1. Architecture: Explicitly choose arm64 during function creation or update.
  2. Memory Allocation: This determines the vCPU and memory resources allocated to your function. Often, you can reduce memory on ARM and still achieve better performance than a higher-memory x86 function.
  3. Runtime: Ensure your chosen runtime is compatible and optimized for ARM. Most modern runtimes (Node.js, Python, Java, .NET Core, Go) are. For custom runtimes or compiled languages, you’ll need to compile your code for the aarch64 target.

The most surprising thing for many is how drastically memory allocation can impact performance on ARM relative to x86. Because Graviton2 instances are more efficient, the performance scaling with added memory can be steeper. This means a function configured with 256MB on ARM might not just be 2x faster than 128MB, but perhaps 2.5x or 3x faster, while an x86 function might only see a 1.8x improvement. This non-linear scaling is critical for fine-tuning cost and performance. You’re not just buying more CPU; you’re getting a more efficient CPU that scales better with resources.

When you switch an existing Lambda function from x86 to ARM, the primary potential hurdle isn’t usually the runtime itself, but rather any native dependencies or compiled libraries your function relies on. These need to be available for the aarch64 architecture. If you’re using a language like Python with libraries like numpy or pandas, you’ll need to ensure you’re using versions that have pre-compiled ARM wheels or that can be compiled on an ARM environment. Many common libraries now support ARM, but custom or less common ones might require extra effort.

The next challenge you’ll likely face is optimizing your Lambda function’s memory allocation on ARM to find the sweet spot between performance and cost.

Want structured learning?

Take the full Lambda course →