Fly.io Machines can scale down to zero instances when they’re not actively serving traffic, saving you money and resources.

Here’s a simple Go application that demonstrates this:

package main

import (
	"fmt"
	"log"
	"net/http"
	"os"
	"time"
)

func main() {
	http.HandleFunc("/", func(w http.ResponseWriter, r *http.Request) {
		fmt.Fprintf(w, "Hello from Fly.io! The time is %s\n", time.Now().Format(time.RFC3339))
	})

	port := os.Getenv("PORT")
	if port == "" {
		port = "8080" // Default to 8080 if PORT env var is not set
	}

	log.Printf("Server starting on port %s", port)
	if err := http.ListenAndServe(":"+port, nil); err != nil {
		log.Fatal(err)
	}
}

To deploy this, you’d typically have a fly.toml file. The key to scaling to zero lies in the auto_stop_machines and min_machines settings.

app = "my-scale-to-zero-app"
primary_region = "ord" # Or your preferred region

[build]
  image = "golang:1.20" # Or your preferred Go version
  builder_command = "go build -o /app/main ."
  image_depth = 1

[services]
  concurrency = 1
  internal_port = 8080

[machines]
  auto_stop_machines = true
  min_machines = 0
  max_machines = 5 # Optional: set a maximum if you want to limit scaling up

When you run fly deploy, Fly.io provisions a machine for your app. If auto_stop_machines is true and min_machines is 0, Fly.io will monitor the machine’s activity. If it detects no incoming traffic for a configurable period (defaulting to 5 minutes), it will automatically stop the machine. This means your app is no longer running, and you’re not being charged for compute time. When new traffic arrives, Fly.io automatically starts a new machine.

The concurrency setting in fly.toml determines how many requests a single machine can handle simultaneously. A value of 1 means each machine will only process one request at a time. This can be useful for simpler applications or when you want to ensure a predictable processing order, but it also means that if you have multiple concurrent requests, Fly.io might need to spin up more machines (up to max_machines) to handle them.

The internal_port is the port your application listens on inside the container. Fly.io will route external traffic to this port. In our Go example, we explicitly check for the PORT environment variable, which Fly.io sets, but we also provide a fallback to 8080 for local development or if PORT isn’t present for some reason.

When auto_stop_machines = true and min_machines = 0, Fly.io’s internal orchestration system is constantly watching for inbound requests to your application’s public IP address. Upon receiving a request, it checks if a machine is available. If not, it triggers the provisioning and startup of a new machine. Once the machine is ready, the request is forwarded to it. After a period of inactivity (the stop_delay parameter, which defaults to 5 minutes and can be configured in fly.toml), if no new requests have arrived, the machine is gracefully shut down and deallocated. This "scale to zero" capability is crucial for cost optimization, especially for applications with sporadic traffic patterns.

The most surprising thing about Fly.io’s auto-stop feature is that it doesn’t just kill the process; it attempts a graceful shutdown. When Fly.io decides to stop a machine due to inactivity, it sends a SIGTERM signal to your application’s main process. This gives your application a chance to clean up resources, close database connections, or finish any ongoing tasks before exiting. If your application doesn’t exit within a short grace period (usually around 10 seconds), Fly.io will then send a SIGKILL to force termination. It’s essential to handle SIGTERM in your application if you need to perform any specific cleanup during shutdowns, whether initiated by inactivity or a manual flyctl.toml update.

When your app scales down to zero and a new request comes in, the next thing you’ll observe is the initial latency of a cold start.

Want structured learning?

Take the full Fly-io course →