Lambda functions can react to S3 object creation events, allowing you to process files as soon as they’re uploaded without constant polling.

Here’s a Python Lambda function that runs when a new object is put into an S3 bucket, reading its contents and printing them.

import boto3
import json

s3_client = boto3.client('s3')

def lambda_handler(event, context):
    print("Received event:", json.dumps(event, indent=2))

    # Get the bucket name and object key from the event
    bucket_name = event['Records'][0]['s3']['bucket']['name']
    object_key = event['Records'][0]['s3']['object']['key']

    try:
        # Get the object from S3
        response = s3_client.get_object(Bucket=bucket_name, Key=object_key)

        # Read the object's content
        object_content = response['Body'].read().decode('utf-8')
        print(f"Content of s3://{bucket_name}/{object_key}:\n{object_content}")

        # You can add your processing logic here
        # For example, parsing JSON, resizing images, etc.

        return {
            'statusCode': 200,
            'body': json.dumps(f"Successfully processed s3://{bucket_name}/{object_key}")
        }

    except Exception as e:
        print(f"Error processing object {object_key} from bucket {bucket_name}: {e}")
        return {
            'statusCode': 500,
            'body': json.dumps(f"Error processing object: {e}")
        }

To set this up, you’ll need:

  1. An S3 Bucket: Create one if you don’t have it already. Let’s call it my-auto-process-bucket.
  2. An IAM Role for Lambda: This role needs permissions to s3:GetObject for the bucket and logs:CreateLogGroup, logs:CreateLogStream, logs:PutLogEvents for CloudWatch Logs.
  3. The Lambda Function: Create a new Lambda function, choose Python 3.9 (or your preferred runtime), and paste the code above. Assign the IAM role you created.
  4. The S3 Trigger:
    • Go to your Lambda function’s configuration.
    • Click "Add trigger."
    • Select "S3" as the trigger type.
    • Choose your bucket (my-auto-process-bucket).
    • For "Event types," select "All object create events" (or more specific ones like PUT, POST, COPY).
    • You can add a prefix or suffix filter if you only want to trigger on certain files.
    • Acknowledge the recursive invocation warning.
    • Click "Add."

Now, when you upload any file to my-auto-process-bucket, this Lambda function will automatically execute, fetch the file’s content, and print it to CloudWatch Logs.

The core mechanism is S3’s event notification system. When an object event occurs (like s3:ObjectCreated:Put), S3 publishes a message to a configured destination. For Lambda triggers, S3 directly invokes the Lambda function, passing the event details (bucket name, object key, etc.) in the payload. The Lambda function then uses these details to interact with S3 and perform its task.

The event dictionary provided to the lambda_handler is a structured JSON object containing all the information about the S3 event. The most crucial parts for processing are event['Records'][0]['s3']['bucket']['name'] and event['Records'][0]['s3']['object']['key'], which pinpoint exactly which object in which bucket was just created.

While the basic setup is straightforward, understanding how to filter events is key for efficiency. You can specify a prefix (e.g., uploads/) or a suffix (e.g., .csv) in the S3 trigger configuration. This means the Lambda function will only be invoked if the uploaded object’s key starts with uploads/ or ends with .csv, preventing unnecessary invocations for unrelated files.

The actual processing logic inside the try block is where you’d implement your specific use case. For text files, reading and decoding is simple. For images, you’d use libraries like Pillow to manipulate them. For structured data like JSON or CSV, you’d parse the object_content accordingly. The boto3 SDK’s get_object method returns a dictionary, and the actual file data is streamed via the Body key, which is a streaming body object. You must read() from this body to get the bytes and then decode() them into a string if it’s a text-based file.

The context object in the lambda_handler provides runtime information about the Lambda function, such as its remaining execution time (context.get_remaining_execution_time_in_millis()) and request ID (context.aws_request_id). While not used in this basic example, it’s invaluable for more complex scenarios, like implementing retry logic or logging detailed execution context.

When S3 triggers a Lambda function, the event payload can contain multiple records if several objects were created in rapid succession or as part of a larger operation (like a multi-part upload completion). Your Lambda function, as written, only processes the first record (event['Records'][0]). For robust applications that might handle bulk uploads or concurrent operations, you should iterate through all records:

def lambda_handler(event, context):
    for record in event['Records']:
        bucket_name = record['s3']['bucket']['name']
        object_key = record['s3']['object']['key']
        # ... process object_key and bucket_name ...

This ensures that every single file upload that matches your trigger criteria is handled, even if multiple occur within the same S3 event notification.

Want structured learning?

Take the full Lambda course →