The surprising truth about maximizing Locust users per worker is that it’s less about tuning Locust itself and more about understanding and optimizing the application under test and the environment it runs in.

Let’s see Locust in action, pushing users. Imagine we’re testing a simple API endpoint that returns a greeting.

from locust import HttpUser, task, between

class QuickstartUser(HttpUser):
    wait_time = between(1, 5)

    @task
    def hello_world(self):
        self.client.get("/hello")

We’ll run this with a single worker, hitting a local Flask app:

# app.py
from flask import Flask

app = Flask(__name__)

@app.route("/hello")
def hello():
    return "Hello, world!"

if __name__ == "__main__":
    app.run(host="0.0.0.0", port=5000)

Start the app: python app.py Start Locust: locust -f locustfile.py --host=http://localhost:5000 --users 1000 --spawn-rate 10 --headless

As we increase --users, we’re not just telling Locust to create more users. We’re telling the Locust worker to manage more concurrent HTTP client instances, dispatch more requests, and process more responses. The bottleneck is almost always external to the Locust worker process itself.

The core problem Locust addresses is simulating realistic user behavior under load. The challenge of "maximizing users per worker" is about efficiently using the resources of the machine running the Locust worker to generate as much realistic load as possible without that worker becoming the bottleneck. This means ensuring the worker can keep up with spawning, running, and reporting on the simulated users.

Internally, a Locust worker uses Python’s gevent library for concurrency. Each simulated user runs within its own greenlet. When a user makes an HTTP request, the greenlet yields control while waiting for the response. This cooperative multitasking is efficient, but it has limits. The primary limits are CPU and network I/O on the worker machine.

The key levers you control are:

  • Worker Machine Resources: CPU, RAM, and network interface speed. A more powerful machine can handle more greenlets.
  • Request Complexity: The size of responses, the number of requests per user, and the latency of the application under test directly impact how much work each greenlet has to do.
  • Locust Configuration: --num-request-per-spawn (though less common now), --spawn-rate, and --headless mode itself reduce overhead.
  • Application Under Test (AUT) Performance: This is paramount. If your AUT is slow, your Locust worker will spend most of its time waiting, and you’ll hit the AUT’s limits long before Locust’s.

Consider a scenario where your Locust worker is struggling. You might see high CPU usage on the worker machine, or network saturation. The solution isn’t to make Locust’s internal gevent loop faster; it’s to make the work it’s doing faster or less demanding.

The one thing most people don’t realize is that even with gevent, the sheer volume of context switching between thousands of greenlets can become a CPU-bound problem on the Locust worker itself. While gevent is excellent at I/O-bound tasks (waiting for network), managing an extremely high number of active greenlets, even if many are just waiting, consumes CPU for scheduler overhead. If your worker machine’s CPU is maxed out at 100%, you’ve likely hit this limit, and the only way to increase users per worker is to either distribute the load across more workers or improve the efficiency of the tasks those greenlets are performing (which often means optimizing the AUT).

The next step after maximizing users per worker is usually distributing that load across multiple Locust master/worker nodes for truly massive scale.

Want structured learning?

Take the full Locust course →