Configure Fly.io Wake-on-Request for Zero-Cost Idle Apps (2026)

Fly.io’s "wake-on-request" feature lets your apps sleep when they’re not being used, saving you money.

Here’s an app running on Fly.io, showing its current state and how it responds to a request:

{
  "id": "app-12345",
  "name": "my-sleeping-app",
  "state": "stopped",
  "region": "ord",
  "release_id": "rel-abcdef123",
  "created_at": "2023-10-27T10:00:00Z",
  "updated_at": "2023-10-27T10:00:00Z"
}

When a request comes in for my-sleeping-app.fly.dev, Fly.io automatically starts it up. You’ll see a brief delay, typically a few seconds, as the machine provisions and your app boots.

curl -I https://my-sleeping-app.fly.dev

The first response might look like this, indicating a startup delay:

HTTP/2 200
content-type: text/plain; charset=utf-8
server: Fly/8a7b5c4d (2023-10-27T11:00:00Z)
fly-request-id: 01GZXXYYZZAA00000000000000
date: Fri, 27 Oct 2023 11:05:00 GMT
alt-svc: h3=":443"; ma=2592000,h3-29=":443"; ma=2592000

After the initial wake-up, subsequent requests to the same app will be fast, as the machine remains running. If no requests arrive for a configured period, Fly.io will automatically stop the machine again.

To enable this, you need to configure your app’s fly.toml file. The key is the auto_stop_machines setting and min_machines.

Here’s a typical fly.toml for an app configured to sleep after 30 minutes of inactivity:

app = "my-sleeping-app"
primary_region = "ord"

[build]
  image = "your-docker-image"

[[services]]
  protocol = "tcp"
  port = 8080
  [[services.concurrency]]
    type = "requests"
    hard_limit = 500
    soft_limit = 100

[experimental]
  auto_stop_machines = true
  min_machines = 0
  # The default auto_stop_timeout is 30 minutes (1800 seconds).
  # You can override it like this:
  # auto_stop_timeout = 600 # Stop after 10 minutes of inactivity

In this configuration:

auto_stop_machines = true tells Fly.io to manage stopping the machine when idle.
min_machines = 0 is crucial. It allows Fly.io to scale down to zero machines when there are no active requests. If min_machines were 1, your app would always have at least one machine running, defeating the purpose of zero-cost idle.
The auto_stop_timeout (defaulting to 30 minutes) defines how long the app must be inactive before Fly.io stops the machine. You can explicitly set this in seconds.

The [[services.concurrency]] block, while not directly related to wake-on-request, is important for managing how many requests a single machine can handle. When min_machines is 0, Fly.io will automatically start new machines to meet demand up to your hard_limit and will scale them down when demand decreases.

When you deploy this configuration, Fly.io will create a machine for your app. If it receives no traffic for the specified auto_stop_timeout period, it will stop the machine. The next request will trigger a new machine to start. This process is what makes it "wake-on-request."

The auto_stop_timeout is measured in seconds. A value of 1800 (the default) means 30 minutes. If you want your app to sleep faster, say after 5 minutes of inactivity, you would set auto_stop_timeout = 300.

It’s important to understand that "idle" means no active requests are being processed by your app. Health checks that don’t result in a request to your application do not prevent it from sleeping. The system doesn’t maintain a persistent connection; it simply monitors incoming HTTP(S) traffic directed to your app’s domain.

When a machine is stopped due to inactivity, it consumes no compute resources and therefore incurs no cost. Only when a request arrives will Fly.io allocate resources, start a new machine, and route the request to it. This is the core mechanism for achieving zero cost for idle applications.

The most surprising aspect of this feature is how it’s implemented at the infrastructure level: Fly.io doesn’t just stop a process; it deallocates the underlying virtual machine. This means not only is your code not running, but the entire execution environment is spun down and then spun back up on demand. This is why the initial request after a period of inactivity has a noticeable latency – it’s the time it takes to provision and boot a new VM.

Consider the trade-off: you save money when your app is idle, but you accept a cold-start latency for the first user who hits your app after it has slept. For many types of applications – like internal tools, infrequent APIs, or personal projects – this latency is a small price to pay for significant cost savings.

If you need your application to be available instantly at all times, you would set min_machines = 1 and ensure your services.concurrency.hard_limit is set appropriately to handle your expected load, effectively disabling the auto-stop feature.

The next concept you’ll likely explore is managing application state across these wake-up cycles, especially if your app relies on in-memory data that would be lost when the machine stops.