Lambda functions are getting a storage upgrade, and it’s not just about more space.
Here’s a Lambda function running with 10GB of /tmp storage, writing a large file and then reading it back:
import json
import os
import time
def lambda_handler(event, context):
file_path = '/tmp/large_file.txt'
file_size_gb = 5 # We'll create a 5GB file
file_size_bytes = file_size_gb * 1024 * 1024 * 1024
print(f"Attempting to write {file_size_gb}GB to /tmp...")
start_time = time.time()
try:
with open(file_path, 'wb') as f:
# Write in chunks to avoid excessive memory usage
chunk_size = 1024 * 1024 # 1MB chunks
for _ in range(file_size_bytes // chunk_size):
f.write(os.urandom(chunk_size))
write_time = time.time() - start_time
print(f"Successfully wrote file in {write_time:.2f} seconds.")
print("Attempting to read the file back...")
start_time = time.time()
read_bytes = 0
with open(file_path, 'rb') as f:
while True:
chunk = f.read(chunk_size)
if not chunk:
break
read_bytes += len(chunk)
read_time = time.time() - start_time
print(f"Successfully read {read_bytes} bytes in {read_time:.2f} seconds.")
return {
'statusCode': 200,
'body': json.dumps({
'message': 'File operation successful!',
'file_size_gb': file_size_gb,
'write_time_seconds': write_time,
'read_time_seconds': read_time
})
}
except Exception as e:
print(f"An error occurred: {e}")
return {
'statusCode': 500,
'body': json.dumps({'error': str(e)})
}
This code demonstrates writing a 5GB file to /tmp and then reading it back. When you execute this on a Lambda function configured with the default 512MB of ephemeral storage, it will fail with an ENOSPC (No space left on device) error. However, if you configure the same function with 10GB of ephemeral storage, it will succeed.
The core problem this solves is the historically cramped /tmp directory in AWS Lambda. By default, Lambda functions are provisioned with 512MB of ephemeral storage in /tmp. This space is crucial for:
- Temporary File Storage: Storing intermediate results, downloaded assets, or generated content that doesn’t fit into memory.
- Libraries and Dependencies: Some libraries, especially those with native extensions or large datasets, might unpack or extract files to
/tmpduring execution. - Databases: Embedded databases like SQLite, or caching layers, often use disk for persistence or temporary storage.
- Large Data Processing: When processing large datasets that can’t be held entirely in RAM,
/tmpbecomes essential for staging data.
Before this feature, developers had to resort to workarounds like:
- Memory Optimization: Ruthlessly optimizing code to fit within memory constraints, often at the cost of readability or performance.
- External Storage (S3, EFS): Offloading temporary data to S3 (which involves network latency and cost) or EFS (which adds complexity and latency).
- Custom Docker Images: Building elaborate container images to manage dependencies and data more effectively, but this adds build complexity and startup time.
With the ability to expand /tmp storage up to 10GB, Lambda becomes a much more viable option for a wider range of workloads, including those that previously required container services or EC2 instances. The storage is provisioned as an EBS volume attached to the Lambda execution environment.
The key levers you control are:
- Memory Allocation: Lambda’s ephemeral storage is directly tied to the memory you allocate to the function. The ratio is fixed: for every 1MB of memory, you get 1MB of ephemeral storage. So, to get 10GB of
/tmpstorage, you need to allocate at least 10240MB (10GB) of memory to your Lambda function. You can configure this in the AWS console under "General configuration" -> "Edit" -> "Memory (MB)". - Region Support: This feature is available in all AWS regions.
- Runtime Support: It’s supported across all Lambda runtimes.
The actual mechanism involves AWS provisioning a dedicated Amazon Elastic Block Store (EBS) volume for your function’s execution environment. When your function is invoked, this EBS volume is mounted at /tmp. The size of this volume is determined by the memory allocation of your Lambda function, with a 1:1 ratio of memory to ephemeral storage. When the function execution completes, the EBS volume is detached. This means each invocation gets a fresh, clean /tmp directory of the configured size.
What’s often surprising is that the ephemeral storage is not just a filesystem overlay on top of the base Lambda image. It’s a distinct EBS volume attached to the execution environment. This means that operations like writing large files or extracting archives are performing direct block I/O to an EBS volume, which can be significantly faster than operations that might involve network access or complex filesystem manipulations on a shared underlying system. The performance characteristics are closer to a local disk than a network file share, making it suitable for I/O-intensive tasks that were previously prohibitive for Lambda.
The next step is to consider how to manage the lifecycle of large temporary files, especially if your function needs to process multiple large datasets sequentially.