The surprising truth about Locust data parameterization is that its core mechanism isn’t about injecting "real user data" at all; it’s about ensuring your load tests behave like real users by consistently feeding them distinct data, preventing cache hits and artificial performance boosts.

Imagine you’re testing an e-commerce site. A user logs in, searches for "red shoes," adds them to their cart, and checks out. If every single simulated user does exactly the same sequence with the exact same product IDs and search terms, your backend might start serving responses from its cache for "red shoes" or product_id=123. This makes your load test look faster than it actually is for a diverse, real-world user base.

Here’s a Locust file demonstrating this in action. We’ll use a simple users.csv file for unique user credentials and a products.csv for distinct product IDs.

from locust import HttpUser, task, between
from locust_plugins.users.csvuser import CsvUser

class WebsiteUser(CsvUser):
    host = "http://localhost:8080"
    wait_time = between(1, 5)

    # Load user credentials from users.csv
    # Expected format: username,password
    user_file = "users.csv"

    # Load product IDs from products.csv
    # Expected format: product_id
    product_file = "products.csv"

    @task
    def browse_and_buy(self):
        # Use credentials from the CSV for login
        username = self.username
        password = self.password

        # Log in
        self.client.post("/login", json={"username": username, "password": password})

        # Get a unique product ID for this user's session
        product_id = self.product_id

        # Browse product
        self.client.get(f"/products/{product_id}")

        # Add to cart
        self.client.post("/cart/add", json={"product_id": product_id, "quantity": 1})

        # Simulate checkout
        self.client.post("/checkout", json={"user_id": username, "product_ids": [product_id]})

    # This is crucial: tell CsvUser to cycle through rows for each user
    # and assign columns to attributes.
    def on_start(self):
        # This method is called when a new user instance starts.
        # CsvUser automatically assigns columns from user_file to attributes
        # named after the column headers (if they exist) or by order.
        # Here, we assume 'users.csv' has headers 'username' and 'password',
        # and 'products.csv' has header 'product_id'.
        pass # CsvUser handles the assignment automatically if headers match.

# Example CSV files:
# users.csv
# username,password
# user1,pass1
# user2,pass2
# user3,pass3

# products.csv
# product_id
# 101
# 102
# 103

In this example:

  • CsvUser is a Locust plugin that provides CSV-based user data.
  • user_file and product_file point to our CSVs.
  • When a WebsiteUser instance starts, CsvUser reads a row from users.csv and assigns username and password to the user instance. Similarly, it reads a row from products.csv and assigns product_id.
  • Crucially, each WebsiteUser instance will independently read from its own position in the CSV files. If you have 100 users, they’ll each get unique credentials and product IDs (assuming your CSVs are large enough). This prevents one user’s actions from influencing another’s data.
  • The browse_and_buy task uses these unique username, password, and product_id values for its requests.

The mental model here is that each Locust user instance is an independent agent. Parameterization, in this context, is about giving each agent unique inputs so their journey through your system is distinct. This means:

  1. Unique Identifiers: User IDs, session tokens, API keys, product SKUs, order numbers.
  2. Varied Inputs: Search terms, form data, request payloads.
  3. Sequential Data: If one user’s actions depend on previous ones (e.g., creating a resource before referencing it), ensure that dependency chain is unique per user.

The core problem this solves is test pollution. Without parameterization, your load test quickly becomes a test of your caching layer and the happy path of a single data set, not a test of your system under concurrent, varied load.

The CsvUser plugin, when configured with user_file and product_file, automatically manages reading rows from these files. For each new simulated user, it assigns the next available row from users.csv to attributes like self.username and self.password, and similarly for products.csv. If your CSVs have headers, CsvUser will map columns to attributes with matching names. If they don’t have headers, it will assign them sequentially to self.user_row[0], self.user_row[1], etc., and similarly for self.product_row.

The next concept you’ll likely encounter is handling more complex data relationships or stateful parameterization, where one user’s data generation depends on the outcome of a previous request within their own test session.

Want structured learning?

Take the full Locust course →