Gatling Enterprise, when used for load testing, isn’t a single monolithic application but rather a distributed system designed to generate massive amounts of load from multiple machines.
Let’s see it in action. Imagine you’re testing a critical e-commerce API. You’ve written your Gatling simulation, which defines user behavior like browsing products, adding to cart, and checking out. To simulate 100,000 concurrent users, you can’t run this on one machine.
Here’s a basic Gatling Enterprise setup for distributed testing:
1. Controller Node: This is the brain. It orchestrates the entire test.
-
Configuration:
gatling.enterprise.system.mode = controllergatling.enterprise.clustering.seed-nodes = ["192.168.1.100:2551"](The IP and port of another controller/seed node, if in a cluster)gatling.enterprise.clustering.port = 2551(The port the controller listens on for cluster communication)gatling.enterprise.clustering.host = 192.168.1.100(The IP address the controller binds to)
-
Action: You trigger a simulation run from the Gatling Enterprise UI or API. The controller then decides how to distribute the load generation tasks.
2. Simulation Nodes (Load Generators): These are the workhorses. They actually run the Gatling simulation and generate the HTTP requests.
-
Configuration:
gatling.enterprise.system.mode = simulationgatling.enterprise.clustering.seed-nodes = ["192.168.1.100:2551"](The IP and port of the controller or another seed node)gatling.enterprise.clustering.port = 2551(The port the simulation node listens on for cluster communication)gatling.enterprise.clustering.host = 192.168.1.101(The IP address the simulation node binds to)gatling.enterprise.clustering.controller.host = 192.168.1.100(The IP of the controller node)gatling.enterprise.clustering.controller.port = 2551(The port of the controller node)
-
Action: When the controller assigns a task (e.g., "run scenario X with 10,000 users") to a simulation node, that node spins up a Gatling process, executes the scenario, and sends its results back to the controller.
3. Results Aggregation: The controller gathers results from all simulation nodes.
- Action: As simulation nodes complete their assigned parts of the test, they stream metrics (request counts, response times, errors) back to the controller. The controller aggregates these into the final reports you see in the Gatling Enterprise UI.
The core problem Gatling Enterprise’s architecture solves is the limitation of single-machine load generation. A single machine has finite CPU, memory, and network bandwidth. To simulate tens of thousands or millions of users, you must distribute the load generation process across multiple machines. Gatling Enterprise uses an Akka Cluster under the hood to manage this distribution.
Internally, the controller acts as a cluster singleton (meaning only one controller instance is active at a time for a given cluster, ensuring consistent orchestration). Simulation nodes are regular cluster members that the controller can communicate with. The controller assigns "simulation tasks" to available simulation nodes. A task is essentially a portion of the total load, defined by a specific scenario and user count.
The controller doesn’t run the simulation itself; it just manages the lifecycle and collects data. It sends instructions like "start simulation A with 5000 users on node B" and "stop simulation A on node C." Each simulation node then executes its assigned portion, sending back results via Akka Remoting. The controller then merges these partial results into a single, comprehensive report.
Think of it like a conductor (controller) directing an orchestra (simulation nodes). The conductor doesn’t play any instruments but tells each musician when and how to play, and then synthesizes the combined sound.
The most surprising thing is how seamlessly Akka Cluster handles node failures and reassignments. If a simulation node crashes mid-test, the controller detects it and can reassign that node’s remaining work to other available simulation nodes, ensuring the test can continue with minimal disruption. This resilience is built into the underlying clustering technology.
The next step in understanding distributed load testing is how to manage network latency and ensure your simulation nodes are geographically distributed to accurately mimic real-world user locations.