Locust Baseline Tests: Establish Performance Benchmarks (2026)

The most surprising thing about Locust baseline tests is how often they’re run incorrectly, leading to benchmarks that are not just useless, but actively misleading.

Let’s see Locust in action with a simple web service. Imagine a tiny Python Flask app that just returns "Hello, World!".

from flask import Flask
import time

app = Flask(__name__)

@app.route("/")
def hello():
    time.sleep(0.01) # Simulate some work
    return "Hello, World!"

if __name__ == "__main__":
    app.run(host="0.0.0.0", port=5000)

Now, we want to test how many requests per second this can handle. Here’s a basic Locust file:

from locust import HttpUser, task, between

class HelloWorldUser(HttpUser):
    wait_time = between(0.1, 0.5) # Users wait between 100ms and 500ms between tasks

    @task
    def hello_world(self):
        self.client.get("/")

To run this, we’d typically start the Flask app and then run Locust from the command line:

locust -f locustfile.py --host=http://localhost:5000

This starts a web UI (usually at http://localhost:8089). You’d enter the number of users and the spawn rate, and hit "Start swarming."

Here’s what’s happening under the hood, and the mental model you need:

Users: Locust simulates concurrent users. Each "user" in Locust is an independent process or thread (depending on configuration). These users execute the tasks defined in your Locustfile.
Tasks: These are the actions your users perform. In our example, it’s self.client.get("/"). Locust randomly picks tasks if you have multiple defined and doesn’t guarantee any order.
wait_time: This is crucial. It defines the think time between a user completing one task and starting the next. It’s not about how fast the server responds, but how frequently a user would initiate a new request. between(0.1, 0.5) means each simulated user waits between 100ms and 500ms before making its next request.
HttpUser.client: This is an instance of HttpSession that handles making HTTP requests. It automatically handles cookies, retries (if configured), and crucially, it reports statistics back to Locust.
Statistics: The Locust UI shows RPS (requests per second), response times, and failure rates. These are aggregated across all simulated users.

The goal of a baseline test is to establish a "normal" performance level under controlled conditions. You’re not trying to break the system yet; you’re trying to understand its current capabilities. This involves:

Realistic User Behavior: The wait_time should mimic how real users interact with your application. If your users are very active, wait_time should be low. If they’re more passive, it should be higher.
Sufficient Load: You need enough simulated users to saturate your system or at least push it to a point where performance degrades. This means experimenting with the number of users.
Controlled Environment: Run tests on hardware and network conditions that are representative of your production environment. Avoid running other heavy processes on the test machine.

When you run the test, Locust will continuously send requests. The RPS you see in the UI is the aggregate RPS across all users. If you have 100 users, and each user makes a request every second (a wait_time of between(0,0)), you’d expect to see around 100 RPS if the server can keep up. If the server can’t keep up, the RPS will be lower, and response times will increase.

The most common mistake is setting wait_time to zero or a very small value and expecting the RPS to represent the absolute maximum throughput the server can handle. This isn’t a "user" test; it’s a raw connection flood. The wait_time is the user’s pace, not the server’s. If you set wait_time = between(0.01, 0.01), each user will try to hit the server every 10ms, and if you have 100 users, that’s an attempted load of 10,000 RPS, which will likely overwhelm even a robust server and give you misleadingly low RPS and high error rates.

Once you have a stable baseline, the next step is to understand how your system behaves under stress, which often involves gradually increasing the load or introducing specific failure scenarios.