Understand the Gatling Virtual User Lifecycle and Execution Model (2026)

The number of virtual users a Gatling simulation can sustain is not directly limited by Gatling itself, but rather by the network and system resources of the machine running Gatling.

Let’s see Gatling in action. Imagine you’re testing an API endpoint that greets users. Here’s a simplified Gatling simulation:

import io.gatling.core.Predef._
import io.gatling.http.Predef._
import scala.concurrent.duration._

class BasicApiSimulation extends Simulation {

  val httpProtocol = http
    .baseUrl("http://localhost:8080") // The base URL of your API
    .acceptHeader("application/json")

  val scn = scenario("User Greeting")
    .exec(http("request_greeting")
      .get("/greet/Alice") // The specific endpoint being hit
      .check(status.is(200)) // Asserting a successful response
      .check(jsonPath("$.message").is("Hello, Alice!")) // Checking the response body
    )

  setUp(
    scn.inject(
      rampUsers(100) during (10 seconds), // Gradually ramp up to 100 users over 10 seconds
      constantUsersPerSec(50) during (30 seconds) // Maintain 50 users per second for 30 seconds
    ).protocols(httpProtocol)
  )
}

When you run this simulation, Gatling doesn’t just fire off 100 requests at once. It orchestrates the creation and execution of virtual users, each representing a simulated user interacting with your application.

The core of Gatling’s execution model is the Virtual User (VU). Think of a VU as a lightweight thread or process that embodies a single simulated user’s journey through your scenarios. Gatling is designed to run thousands, even tens of thousands, of these VUs concurrently.

Here’s how the lifecycle of a VU plays out:

Creation: When setUp defines an injection profile (like rampUsers or constantUsersPerSec), Gatling determines how many VUs need to be created and when. For rampUsers(100) during (10 seconds), Gatling will create 10 VUs every second for 10 seconds, totaling 100 VUs. For constantUsersPerSec(50) during (30 seconds), Gatling will aim to start 50 new VUs every second for 30 seconds, aiming for a steady state of 1500 concurrent users (50 users/sec * 30 sec, though the ramp-up might influence the peak).
Execution: Once a VU is created, it starts executing the steps defined in its scenario. In our BasicApiSimulation, a VU would perform the http("request_greeting") action, which means sending a GET request to /greet/Alice. It then waits for the response, performs the checks (status.is(200), jsonPath("$.message").is("Hello, Alice!")), and if successful, it effectively "completes" that iteration of the scenario.
Iteration: A VU doesn’t just run a scenario once. It typically loops through the scenario’s steps repeatedly for the duration of its "lifetime" or until the simulation’s setUp configuration dictates otherwise. This continuous looping is what generates sustained load.
Termination: A VU’s life ends when its defined injection period is over. If you have constantUsersPerSec(50) during (30 seconds), VUs created in the first second will run for 30 seconds, and VUs created in the 30th second will also run for 30 seconds from their creation time. Gatling manages the lifecycle, ensuring VUs stop executing and are garbage collected when their time is up.

The key to Gatling’s performance is that VUs are not tied to actual threads for their entire duration. Unlike traditional thread-per-user models, Gatling uses an asynchronous, non-blocking I/O model powered by Akka. When a VU makes an HTTP request, it doesn’t block a thread waiting for the response. Instead, it registers a callback and yields control back to Gatling’s event loop. When the response arrives, the event loop picks up the VU’s execution from where it left off. This allows a single thread to manage hundreds or thousands of VUs that are currently waiting for I/O operations to complete.

This asynchronous nature is why Gatling can achieve such high concurrency with relatively low resource consumption on the Gatling controller machine. The primary bottlenecks become the network bandwidth, CPU for request/response processing, and memory for managing state.

The rampUsers(100) during (10 seconds) injection profile means Gatling aims to have 100 virtual users active by the end of the 10-second ramp-up period. It achieves this by creating 10 new virtual users per second. Each of these virtual users will then start executing the "User Greeting" scenario. Once a virtual user has completed one iteration of the scenario, it immediately checks if it should start another iteration or if its "lifetime" is over based on the setUp configuration.

The constantUsersPerSec(50) during (30 seconds) profile is where the sustained load comes from. Gatling will attempt to initiate 50 new virtual users every second for the next 30 seconds. These newly created VUs will then begin executing their scenario. A VU created at second 1 of this phase will continue to run for the entire 30 seconds, potentially performing many iterations of the scenario. A VU created at second 30 will also run for 30 seconds from its creation time. This means the total number of concurrent VUs will reach approximately 50 users/sec * 30 sec = 1500 users (assuming no other injection phases are active and VUs complete quickly enough).

If your simulation’s checks are failing or requests are timing out, the next thing you’ll likely encounter is Gatling reporting a high number of failed requests.