Cloud Run is actually a managed Kubernetes cluster running your container, not just a simple function execution environment.

Let’s see Cloud Run in action. Imagine you have a Python Flask application that just echoes back whatever you send it.

from flask import Flask, request

app = Flask(__name__)

@app.route('/', methods=['GET', 'POST'])
def echo():
    if request.method == 'POST':
        return f"Received POST data: {request.data.decode('utf-8')}"
    else:
        return f"Received GET request. Query params: {request.args}"

if __name__ == '__main__':
    app.run(debug=True, host='0.0.0.0', port=8080)

To deploy this to Cloud Run, you’d first build a container image. A Dockerfile would look like this:

FROM python:3.9-slim

WORKDIR /app
COPY requirements.txt requirements.txt
RUN pip install --no-cache-dir -r requirements.txt
COPY . .

CMD ["python", "app.py"]

And requirements.txt would just have Flask.

Then, you build and push this image to Google Container Registry (GCR) or Artifact Registry. Let’s say your image is gcr.io/my-project-id/echo-app:latest.

Now, you deploy it to Cloud Run:

gcloud run deploy echo-app \
  --image gcr.io/my-project-id/echo-app:latest \
  --platform managed \
  --region us-central1 \
  --allow-unauthenticated \
  --port 8080 \
  --max-instances 5 \
  --cpu 1 \
  --memory 512Mi

After deployment, Cloud Run gives you a URL. If you curl it:

curl https://echo-app-abcdef123-uc.a.run.app/?name=world

You’ll get: Received GET request. Query params: ImmutableMultiDict([('name', 'world')])

And a POST request:

curl -X POST -d "hello data" https://echo-app-abcdef123-uc.a.run.app/

Returns: Received POST data: hello data

The core problem Cloud Run solves is abstracting away the complexities of running web services or APIs without managing servers. You provide a container, and Google handles scaling, patching, load balancing, and networking. Cloud Functions, on the other hand, is designed for event-driven, single-purpose code snippets that execute in response to specific triggers (like Pub/Sub messages, HTTP requests, or Cloud Storage events).

Internally, Cloud Run uses Knative, which runs on Google Kubernetes Engine (GKE) managed by Google. When a request comes in, Cloud Run can scale your service from zero instances up to your configured maximum. It routes traffic to your container, which listens on the port specified by the PORT environment variable (which Cloud Run injects, and we configured our app to use 8080 as a default, but it’s best practice to read os.environ.get("PORT", 8080)). You control the compute resources allocated to each instance (--cpu, --memory), the maximum number of instances (--max-instances), and concurrency (how many requests a single instance can handle simultaneously, defaulting to 1000 for HTTP).

Cloud Functions has a simpler execution model. It’s a single-purpose function. While it can be triggered by HTTP, it’s not inherently designed to be a long-running web server or API endpoint in the same way Cloud Run is. Its lifecycle is tied to the event it’s processing. If you need to run a complex application with multiple endpoints, background tasks, or persistent connections, Cloud Run is the more natural fit. Cloud Functions excels at reacting to events and performing discrete tasks.

The most surprising thing is how Cloud Run seamlessly manages the "scale to zero" behavior. When there are no incoming requests, Cloud Run scales down your service to zero running containers. This means you pay absolutely nothing for compute when your service isn’t being used, which is a significant cost advantage over traditional serverless compute that might have a minimum charge or always-on instance. The infrastructure is still there, managed by Google, but your specific container isn’t consuming CPU or memory until a request arrives.

The next concept to explore is managing environment variables and secrets for your Cloud Run services, and how to integrate them with other GCP services like Cloud SQL or Pub/Sub.

Want structured learning?

Take the full Gcp course →