Choose Between Open and Closed Workload Models in Gatling (2026)

The core difference between Gatling’s open and closed workload models isn’t about how many requests you send, but when you expect the next one to be sent.

Let’s see Gatling in action. Imagine you’re testing a simple API endpoint that takes 2 seconds to respond.

Here’s a basic Gatling simulation using the open workload model:

import io.gatling.core.Predef._
import io.gatling.http.Predef._
import scala.concurrent.duration._

class OpenWorkloadSimulation extends Simulation {

  val httpProtocol = http
    .baseUrl("http://localhost:8080")
    .doNotTrackHeader("DNT")
    .acceptLanguageHeader("en-US,en;q=0.5")
    .acceptEncodingHeader("gzip,deflate")
    .userAgentHeader("Mozilla/5.0 (Windows NT 10.0; Win; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3")

  val scn = scenario("Open Workload Scenario")
    .exec(http("request_1")
      .get("/resource")
      .check(status.is(200)))

  setUp(
    scn.inject(
      rampUsers(100) during (30 seconds) // Ramp up to 100 users over 30 seconds
    ).protocols(httpProtocol)
  ).maxDuration(1 minute)
}

In this OpenWorkloadSimulation, Gatling’s default behavior is to send requests at a fixed rate. If the system takes 2 seconds to respond, Gatling doesn’t wait for that response before scheduling the next request for that same virtual user. It will simply send the next request after its defined think time, or immediately if no think time is configured. If you’ve specified, say, 100 users and a 1-second "pace" (the inverse of the request rate), Gatling will attempt to send 100 requests per second, regardless of response times. This models a system where users are constantly bombarding the service, perhaps due to background processes or very impatient users.

Now, let’s look at the closed workload model. This is where things get interesting. In a closed model, you define the number of concurrent users you want to maintain. Gatling will then try to keep that exact number of users active, sending a new request only when a previous one completes, plus any configured think time.

Here’s how you’d configure a closed workload:

import io.gatling.core.Predef._
import io.gatling.http.Predef._
import scala.concurrent.duration._

class ClosedWorkloadSimulation extends Simulation {

  val httpProtocol = http
    .baseUrl("http://localhost:8080")
    .doNotTrackHeader("DNT")
    .acceptLanguageHeader("en-US,en;q=0.5")
    .acceptEncodingHeader("gzip,deflate")
    .userAgentHeader("Mozilla/5.0 (Windows NT 10.0; Win; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/53.0")

  val scn = scenario("Closed Workload Scenario")
    .exec(http("request_1")
      .get("/resource")
      .check(status.is(200)))
    .pause(2) // Simulate user think time of 2 seconds

  setUp(
    scn.inject(
      constantUsersPerSec(10) during (1 minute) // This is the open model syntax
      // To use closed workload, you'd typically use atOnceUsers or rampUsers with specific logic
      // For a true closed model, you define the number of active users and Gatling manages pacing.
      // Let's simulate a closed workload by defining the number of concurrent users.
      // The default injection is open, so to demonstrate closed, we'll focus on the concept.
      // A common way to represent closed workload is to define the desired concurrency.
      // Gatling's `atOnceUsers` or `rampUsers` can be used to *achieve* a closed workload *if*
      // you understand how Gatling paces them. The key is Gatling will *not* send the next
      // request for a user until the previous one is done + think time.

      // Example of injecting for a closed workload:
      // atOnceUsers(50) // This starts 50 users immediately. If they all request and pause,
                       // Gatling will ensure that *at most* 50 are active.
      // The following demonstrates the *effect* of a closed workload.
      // We'll inject users and expect Gatling to manage the concurrency.
      // For a pure closed model, you'd typically specify the number of concurrent users.
      // Let's assume we want 50 concurrent users.

      // The `constantUsersPerSec` syntax *is* for open workload.
      // For closed workload, the injection profile defines the *number* of users to reach and maintain.
      // The pacing is then derived from the response time + think time.
    ).protocols(httpProtocol)
  ).maxDuration(1 minute)
}

In the ClosedWorkloadSimulation, we’ve added a pause(2) to simulate user think time. If you set atOnceUsers(50), Gatling will start 50 users. Each user will make a request. If the request takes 2 seconds and the think time is 2 seconds, Gatling will ensure that only 50 users are "active" (meaning they have an ongoing request or are in their pause). The rate of requests will dynamically adjust based on the response time and think time. If the response time goes up, the throughput (requests per second) will go down, because Gatling is trying to maintain that fixed number of concurrent users. This is a much more realistic model for many user-driven applications where users interact sequentially.

The critical distinction is how Gatling schedules the next iteration for a virtual user. In the open model, it’s driven by a fixed rate (e.g., 10 requests per second). In the closed model, it’s driven by the completion of the previous request plus think time, aiming to maintain a constant level of concurrency.

The most counterintuitive aspect of Gatling’s workload models is how the "rate" is interpreted. In an open model, a constantUsersPerSec(10) means Gatling attempts to send 10 requests every second, irrespective of how long each request takes. If your server takes 5 seconds to respond, Gatling will still try to send another request 100ms later for that same user, leading to a massive buildup of concurrent requests. In contrast, a closed model with atOnceUsers(10) and a 5-second response + 1-second think time will result in Gatling sending one request every 6 seconds per user, effectively capping your throughput at 10 users / 6 seconds/user, or roughly 1.67 requests per second, but maintaining exactly 10 users in flight.

The next concept you’ll grapple with is how to accurately model realistic user behavior, often requiring a hybrid approach or very specific injection profiles.