A load testing tool can actually cause the very performance problems it’s designed to find.

Let’s say you’re trying to simulate a sudden, massive influx of users to your web application – a "spike test." You’ve set up Locust, your favorite Python-based load testing framework, and you’re ready to unleash the hounds. You configure your locustfile.py to have a few basic user behaviors:

from locust import HttpUser, task, between

class WebsiteUser(HttpUser):
    wait_time = between(1, 5) # Users wait 1-5 seconds between tasks

    @task
    def index(self):
        self.client.get("/")

    @task
    def about(self):
        self.client.get("/about/")

You spin up the Locust web UI, set your desired number of users and the spawn rate, and hit "Start swarming." Initially, everything looks good. Your application handles the load, response times are within acceptable limits, and your metrics are green.

Then, you decide to simulate that sudden surge. You increase the number of users dramatically over a very short period. This is where things can get hairy.

The System in Action: Visualizing the Spike

Imagine your application is a busy restaurant. Customers (users) arrive at a steady pace, and the kitchen (your application servers) can handle them. Now, picture a bus of 50 tourists arriving all at once. The restaurant gets slammed. Wait times skyrocket, orders get mixed up, and the kitchen might even shut down temporarily.

In Locust, this surge is configured by setting a high "Spawn rate" in the web UI. For example, if you have 100 users running and you set the spawn rate to 1000 users per minute, Locust will attempt to start 1000 new users every minute. If you have 10,000 users total, and you start with 100, reaching 10,000 users will take roughly 10 minutes if the spawn rate is constant. But for a spike test, you might want to reach that 10,000 user count in, say, 30 seconds. That means a spawn rate of 200 users per second (10,000 users / 30 seconds).

Here’s what that looks like in the Locust UI:

  • Total Users: 10,000
  • Spawn Rate: 200 (users per second)

When you hit "Start swarming," Locust’s master process tells its workers to start spawning users. Each worker runs a set of "greenlets" (lightweight coroutines). As Locust starts these greenlets, they begin executing your locustfile.py tasks. If the application can’t keep up with the rate at which these new greenlets are hitting it, you’ll start seeing the symptoms of overload: high response times, failed requests, and potentially even your application crashing.

The Mental Model: How Locust Simulates Load

Locust operates on a master-worker architecture. The master node manages the overall swarm and the web UI. Worker nodes execute the actual user simulations.

  1. Master: Receives user input (total users, spawn rate).
  2. Workers: Each worker is responsible for spawning and running a subset of the total users. The master tells workers how many users to spawn and when.
  3. User Simulation: Each spawned "user" is essentially a Python greenlet running your HttpUser class. These greenlets independently execute the tasks defined in your locustfile.py, making HTTP requests to your target system.
  4. Rate Limiting (Internal): Locust itself has internal mechanisms to control the spawn rate. It uses a scheduler to ensure it doesn’t try to spawn users faster than the configured rate. However, the real bottleneck isn’t Locust’s spawning mechanism; it’s how quickly your target system can respond to the requests generated by those spawned users.

Levers You Control:

  • Total Users: The peak number of concurrent simulated users.
  • Spawn Rate: How quickly Locust adds new users to the swarm. This is the primary lever for simulating sudden surges.
  • Wait Time: The between(min_seconds, max_seconds) in your HttpUser class. This dictates how long a simulated user pauses between executing tasks. A shorter wait time means more frequent requests per user.
  • Tasks: The specific endpoints and actions your simulated users perform. More complex or resource-intensive tasks will stress your system more.

The key insight for spike testing is that you’re not just increasing the number of users; you’re increasing the rate at which new users begin making requests. This rapid ramp-up is what stresses systems that might handle a steady state of users fine but buckle under sudden bursts.

The Surprise: Locust’s Own Resource Consumption

What most people don’t realize is that the Locust workers themselves can become a bottleneck, especially during very aggressive spike tests. If you’re spawning thousands of users per second, each user represented by a greenlet, the sheer number of active greenlets and the network I/O they generate can consume significant CPU and memory on the worker machines. If the Locust workers themselves are maxed out, they can’t effectively instruct the system to spawn more users or process their responses, leading to inaccurate results and potentially masking real application performance issues. You might see Locust reporting low throughput or high latency, not because your application is failing, but because your load generator is struggling.

This leads to the next challenge: scaling your load generators to accurately simulate massive traffic.

Want structured learning?

Take the full Locust course →