Measure P50, P95, and P99 Latency Percentiles in Gatling (2026)

Gatling doesn’t actually measure P50, P95, and P99 latency directly; it measures the time it takes for a request to complete, which is a much more granular and useful metric.

Here’s a Gatling simulation that measures request latency and then visualizes the results using its built-in HTML report.

import io.gatling.core.Predef._
import io.gatling.http.Predef._
import scala.concurrent.duration._

class BasicSimulation extends Simulation {

  val httpProtocol = http
    .baseUrl("http://localhost:8080") // Replace with your application's base URL
    .acceptHeader("application/json")
    .doNotTrackHeader("DNT", "1")
    .userAgentHeader("Mozilla/5.0")

  val scn = scenario("Basic Load Test")
    .exec(http("Request 1")
      .get("/resource")
      .check(status.is(200))) // Basic check to ensure the request was successful
    .pause(1) // Pause for 1 second between requests
    .exec(http("Request 2")
      .post("/another-resource")
      .body(StringBody("""{"key": "value"}""")).asJson
      .check(status.is(201)))

  setUp(
    scn.inject(
      rampUsers(10).during(10.seconds), // Ramp up to 10 users over 10 seconds
      constantUsersPerSec(5).during(20.seconds) // Maintain 5 users per second for 20 seconds
    ).protocols(httpProtocol)
  )
}

When you run this simulation, Gatling will generate an HTML report. Navigate to the "Requests" tab. For each request defined in your scenario (e.g., "Request 1", "Request 2"), you’ll see a table of statistics. This table includes columns like "min", "max", "mean", and importantly, "75%", "95%", and "99%". These represent the 75th, 95th, and 99th percentiles of the response times for that specific request.

The core problem Gatling addresses is the inadequacy of simple averages for understanding performance. An average latency of 100ms might sound great, but if 5% of your users are experiencing 2-second delays, the average hides a critical performance bottleneck. Percentiles like P95 and P99 reveal these tail latencies, showing the experience of your least-served users. Gatling’s HTTP module captures the start and end timestamps for every HTTP request and response, calculates the duration, and then aggregates these durations across all requests of the same type. It uses efficient algorithms to compute these percentiles without requiring excessive memory or computation, making it suitable for high-throughput load testing.

The baseUrl in the httpProtocol is your single point of truth for the target application’s address. The rampUsers and constantUsersPerSec in setUp define how your virtual users are introduced into the system over time, simulating different load patterns. The pause statement is crucial for making your simulation more realistic by mimicking human think times between actions.

The true power of Gatling’s percentile reporting is that it’s not an approximation; it’s a precise calculation based on the actual recorded response times. When Gatling calculates the P99 for a request, it sorts all recorded response times for that request and picks the value at the 99th position. This means if you have 1000 requests, the P99 is the 990th slowest request.

The next step after understanding your latency percentiles is to correlate them with other metrics, such as error rates and throughput, to get a holistic view of your application’s performance under load.