Locust API Load Testing: Simulate 10K Users in Python (2026)

You can make Locust spawn 10,000 users from a single machine, but it’s almost always the wrong way to do it.

Let’s see Locust in action, not just talking about it. Imagine we’re testing a simple API endpoint that just echoes back whatever you send it.

from locust import HttpUser, task, between

class EchoUser(HttpUser):
    wait_time = between(1, 2)  # Wait 1-2 seconds between tasks

    @task
    def echo_task(self):
        self.client.post("/echo", json={"message": "hello"})

This EchoUser class is the blueprint for each simulated user. When you run Locust, it instantiates these users. The wait_time is crucial – it’s not about making the simulation "realistic" in terms of human behavior, but about controlling the rate at which users hit your system. Without it, each user would hammer the API as fast as possible.

Now, the title mentions 10,000 users. Here’s how you’d try to do that from your terminal:

locust -f your_locustfile.py --host=http://localhost:8000 --users 10000 --spawn-rate 100

The --users 10000 flag tells Locust to eventually spawn 10,000 users. The --spawn-rate 100 means it will spawn 100 users per second until it reaches the target of 10,000.

But here’s the kicker: running 10,000 users from a single machine is a recipe for disaster. Your machine will choke on network connections, memory, and CPU long before it can effectively simulate 10,000 independent users. The problem isn’t Locust itself, but the fundamental limits of a single operating system and hardware.

The core problem Locust solves is understanding how your application behaves under concurrent load. It’s not about how many users you can simulate from one box, but how many concurrent requests your system can handle before performance degrades. The "users" in Locust are really just a mechanism to generate requests at a certain rate. Each user independently decides when to execute a task, and the wait_time introduces pauses.

When you run Locust in standalone mode (which is what the command above does), it runs a single "master" process and a single "worker" process on your local machine. This worker is responsible for actually creating and running the user instances. For 10,000 users, this single worker becomes the bottleneck.

The real way to scale Locust is by using its distributed mode. You run one "master" node and multiple "worker" nodes. The master coordinates the workers, and each worker handles a portion of the total user load.

Here’s how you’d set up a distributed test:

On the Master Node:

locust -f your_locustfile.py --host=http://your-api-host.com --master --web-host=0.0.0.0

--master: Designates this node as the master.
--web-host=0.0.0.0: Makes the Locust web UI accessible from any IP address (be careful in production!).

On each Worker Node:

locust -f your_locustfile.py --host=http://your-api-host.com --worker

--worker: Designates this node as a worker.

Once your workers are running and connected to the master, you access the master’s web UI (usually at http://<master-ip>:8089). There, you’ll specify the total number of users and the spawn rate. The master will then distribute the user load across all connected workers. For 10,000 users, you might have 1 master and, say, 5-10 worker machines, each running a Locust worker process. Each worker machine would then be responsible for a fraction of the total users, making the load manageable.

The trick to understanding Locust’s user simulation is realizing that each "user" is an independent Python greenlet (a lightweight, cooperatively scheduled execution unit). When you run in distributed mode, the master tells workers how many users to spawn. Each worker then creates its allocated greenlets locally. The network traffic and request execution happen from the worker machines. This offloads the actual request generation and network I/O, allowing you to scale far beyond what a single machine can handle.

One subtle point often missed is how the spawn-rate interacts with the number of workers. If you set --users 10000 and --spawn-rate 100 on the master, and you have 10 workers, each worker will attempt to spawn 10 users per second (100 total spawn rate / 10 workers). If your workers are not fast enough to initialize these greenlets and start making requests, the actual spawn rate might be lower than configured, and the total user count might take longer to reach its target.

The next logical step after mastering distributed load generation is understanding how to simulate different user behaviors and traffic patterns using Locust’s features like task sets and custom event hooks.