Run Baseline, Stress, and Spike Tests with Gatling (2026)

Gatling doesn’t just simulate load; it simulates user behavior, which is fundamentally different from simple thread-per-request load generators.

Let’s see it in action. Imagine we’re testing a simple API endpoint that returns a greeting.

import io.gatling.core.Predef._
import io.gatling.http.Predef._
import scala.concurrent.duration._

class BasicApiSimulation extends Simulation {

  val httpProtocol = http
    .baseUrl("http://localhost:8080") // The base URL of your application
    .acceptHeader("application/json") // What we expect back
    .doNotTrackHeader("DNT", "1")
    .disableCaching

  val scenario1 = scenario("User Scenario")
    .exec(http("Request Greeting")
      .get("/hello?name=Gatling")
      .check(status.is(200)) // Assert the response status code
      .check(jsonPath("$.message").is("Hello Gatling!")) // Assert the response body content
    )

  setUp(
    scenario1.inject(
      rampUsers(10) during (10 seconds), // Gradually ramp up to 10 users over 10 seconds
      constantUsers(10) during (20 seconds) // Maintain 10 users for 20 seconds
    ).protocols(httpProtocol)
  )
}

This BasicApiSimulation defines a single scenario. It injects 10 users over 10 seconds, holding that load for another 20 seconds. The http block configures the base URL and common headers. The exec block describes the user’s actions: making a GET request to /hello?name=Gatling, then checking that the HTTP status is 200 and the JSON response contains {"message": "Hello Gatling!"}.

The setUp block is where the magic happens. scenario1.inject(...) defines the load profile. rampUsers(10) during (10 seconds) means we start with 0 users and, over 10 seconds, increase the user count linearly until we reach 10 concurrent users. constantUsers(10) during (20 seconds) then keeps exactly 10 users active for the next 20 seconds.

This is how Gatling builds its mental model: it’s not about firing requests as fast as possible; it’s about maintaining a certain number of virtual users who perform actions over time. This distinction is crucial for realistic performance testing.

The core problem Gatling solves is accurately simulating how real users interact with a system. Unlike tools that just blast requests, Gatling models users as stateful entities. A user "arrives," performs a sequence of actions (HTTP requests, possibly with pauses in between), and then might "leave" or repeat the sequence. This makes the simulation much closer to reality, allowing you to uncover bottlenecks that arise from complex user flows, not just raw request volume.

Internally, Gatling uses an Akka-based actor system. Each virtual user is an actor, and the system orchestrates these actors to execute the defined scenarios. This is highly scalable and efficient. The "user behavior" is defined in Scala DSL, which is compiled into bytecode, leading to very fast execution of the simulation code itself.

The key levers you control are:

Load Profiles (inject): How users are introduced into the system over time. This includes rampUsers, constantUsers, atOnceUsers, nothingFor, and combinations thereof. This directly influences the concurrency and arrival rate.
Scenario Steps (exec): The sequence of actions a user performs. This includes HTTP requests, assertions (check), pauses (pause), conditional logic, loops, and more complex orchestrations. This defines the user journey.
HTTP Configuration (http): Base URL, headers, connection pooling, caching, etc. This affects how efficiently requests are made.
Assertions (check): Crucial for validating correctness under load. You’re not just measuring performance; you’re measuring performance while the application behaves as expected.

When you run a Gatling simulation, it generates a detailed HTML report. This report is your window into the system’s performance. It shows metrics like response times (average, percentiles), throughput (requests per second), and error rates. The most valuable part is often the breakdown by request, allowing you to pinpoint which specific API calls are struggling.

Most people understand rampUsers and constantUsers as ways to increase load. What’s less obvious is how the pause step within a scenario directly influences the effective concurrency for a given scenario. If a user pauses for 5 seconds after each request, and your scenario has 3 requests, then to maintain 100 concurrent users, you’ll need to inject far more than 100 users per second because each user is "occupied" for a significant duration. This is how Gatling accurately models the system’s capacity to keep users engaged over time, not just process requests in isolation.

The next concept you’ll likely explore is how to manage multiple, complex user journeys simultaneously within a single simulation.