The most counterintuitive thing about tuning Gatling for accurate load testing is that performance gains often come from reducing the number of threads, not increasing them.

Let’s see Gatling in action. Imagine we’re testing a simple API endpoint that returns user data.

import io.gatling.core.Predef._
import io.gatling.http.Predef._
import scala.concurrent.duration._

class UserSimulation extends Simulation {

  val httpProtocol = http
    .baseUrl("http://localhost:8080") // The base URL of the service we're testing
    .acceptHeader("application/json") // We expect JSON responses
    .contentTypeHeader("application/json") // We send JSON data

  val scn = scenario("User Data Retrieval")
    .exec(
      http("Request User Data")
        .get("/users/123") // The specific endpoint
        .check(status.is(200)) // We expect a 200 OK response
    )

  // Default simulation settings
  setUp(scn.inject(atOnceUsers(100))).protocols(httpProtocol)
}

This simulation will hit http://localhost:8080/users/123 100 times, all at once, and check for a 200 status code. When you run this, Gatling generates a report showing metrics like request per second, response times, and error rates. The goal of tuning is to make these numbers reflect the real performance of your application under load, not the limitations of your testing setup.

The problem Gatling solves is simulating a large number of users interacting with your application concurrently. Traditional methods often involve spinning up many separate processes or threads, each acting as a user. This can quickly overwhelm the testing machine itself, leading to inaccurate results where the bottleneck isn’t your application, but your load generator. Gatling, built on Scala and Akka, uses a highly efficient, asynchronous, non-blocking architecture. This allows a single Gatling process to manage thousands of virtual users with relatively low resource overhead on the machine running Gatling.

The core levers you control are in the setUp block and the JVM settings of the machine running Gatling.

Simulation Settings (in setUp)

  • injectionProfile: This defines how users are injected over time.

    • atOnceUsers(n): Starts n users immediately. Good for a quick spike test.
    • rampUsers(n) during (duration): Gradually increases users to n over duration. Simulates a growing user base.
    • constantUsersPerSec(rate) during (duration): Maintains a steady rate of users per second for a duration. Excellent for steady-state load.
    • rampUsersPerSec(rate) during (duration): Gradually increases the rate of users per second.
    • stressPeakUsers(n) during (duration): A more advanced profile for stress testing, designed to find breaking points.

    Example: setUp(scn.inject(rampUsers(1000) during (30.seconds))).protocols(httpProtocol) This starts with 0 users and ramps up to 1000 users over 30 seconds.

  • maxDuration and max[ScenarioName]: You can set a maximum duration for the entire simulation or for individual scenarios to prevent them from running indefinitely.

  • scenario definitions: You can define multiple scenarios and inject users into them independently, allowing you to simulate complex user journeys.

JVM Settings (on the machine running Gatling)

These are critical because Gatling is a JVM application.

  • Heap Size (-Xms and -Xmx): This is the most common tuning point. If Gatling runs out of memory, it will slow down dramatically or crash.

    • Diagnosis: Monitor JVM heap usage using jstat -gcutil <pid> <interval> or your system’s monitoring tools. Look for consistently high utilization (e.g., 90%+) or frequent Full GCs.
    • Fix: Adjust JAVA_OPTS or GATLING_OPTS before running Gatling. For example, to set initial and maximum heap to 4GB:
      export GATLING_OPTS="-Xms4g -Xmx4g"
      ./bin/gatling.sh
      
    • Why it works: A larger heap provides more working memory for Gatling’s internal data structures, thread stacks, and the JVM’s garbage collection processes. This prevents excessive garbage collection pauses and out-of-memory errors.
  • Garbage Collector: The default GC might not be optimal for high-throughput, short-lived object creation.

    • Diagnosis: Observe GC logs (enable with -Xlog:gc*:file=gc.log). Frequent, long Full GCs are a red flag.
    • Fix: Try G1GC (-XX:+UseG1GC) or ZGC (-XX:+UseZGC for very large heaps and low latency needs). For instance, to use G1GC:
      export GATLING_OPTS="-Xms4g -Xmx4g -XX:+UseG1GC"
      
    • Why it works: Different GCs are optimized for different workloads. G1GC is generally good for large heaps and aims for predictable pause times. ZGC is designed for extremely low pause times, crucial for preventing Gatling from becoming the bottleneck.
  • File Descriptors (ulimit -n): Each connection Gatling makes requires a file descriptor.

    • Diagnosis: If you see java.io.IOException: Too many open files errors in Gatling’s logs, your file descriptor limit is too low. Check with ulimit -n.
    • Fix: Increase the limit for the user running Gatling. In bash:
      ulimit -n 65536
      
      For persistent changes, edit /etc/security/limits.conf.
    • Why it works: This allows the operating system to track and manage a larger number of network connections concurrently, preventing Gatling from being starved of available handles.
  • Thread Stack Size (-Xss): While less common, extremely deep call stacks in your simulation code could exhaust thread memory.

    • Diagnosis: java.lang.OutOfMemoryError: unable to create new native thread. This is rare for typical Gatling simulations but possible with very complex exec chains or recursive logic.
    • Fix: Reduce stack size. Default is often 1MB. Try export GATLING_OPTS="-Xss512k".
    • Why it works: Less stack space per thread means more threads can be created before hitting the OS limit on native threads.

One subtle, yet powerful, tuning knob is the simulation.threads.quota setting in Gatling’s configuration. By default, Gatling uses a dynamic number of Netty event loops (threads) up to a system-dependent maximum. If you’re observing high CPU on your Gatling machine and your application is still not hitting its limits, you might be overwhelming the Gatling process itself. Reducing this quota, for example, gatling.conf:

gatling {
  simulation {
    threads {
      quota = 1000 // Limit to 1000 Netty event loop threads
    }
  }
}

can sometimes improve stability and allow you to push more effective load to your application by preventing Gatling’s internal processing from becoming the bottleneck. This is where the counterintuitive idea of reducing threads for better results comes in.

The next step after tuning Gatling is understanding how to interpret the detailed report and correlate Gatling’s metrics with your application’s own performance monitoring.

Want structured learning?

Take the full Gatling course →