k6 API Performance Testing: Validate Under Real Load (2026)

The most surprising thing about k6 API performance testing is that its "real load" often isn’t real enough without deliberate effort.

Let’s see k6 in action. Imagine we’re testing a simple /users endpoint that returns a list of users.

import http from 'k6/http';
import { sleep } from 'k6';

export const options = {
  vus: 100, // Number of virtual users
  duration: '30s', // Test duration
};

export default function () {
  http.get('http://localhost:3000/users');
  sleep(1); // Simulate think time
}

This script spins up 100 virtual users for 30 seconds, each making a request to /users and then sleeping for one second. It’s a good start, but "real load" means more than just concurrent users.

The problem this solves is understanding how your API behaves when it’s actually being hammered, not just tickled. Performance isn’t just about how fast a single request is, but how many requests your system can handle concurrently without degrading quality of service (latency, error rate, throughput).

Internally, k6 simulates this by managing a pool of VUs. Each VU is an independent thread of execution that runs your script. The vus and duration options control how many of these threads are active and for how long. The http module handles the actual network requests, and sleep introduces pauses to mimic user behavior.

But what makes load "real"?

1. Realistic User Behavior: Your API isn’t just hit by one type of request. Users browse, click, search, and perform sequences of actions. Simulating this means varying your requests.

import http from 'k6/http';
import { sleep, check } from 'k6';
import { randomItem } from 'https://jslib.k6.io/k6-utils/1.2.0/index.js';

export const options = {
  vus: 50,
  duration: '1m',
  thresholds: {
    http_req_failed: 'rate<0.01', // http errors should be less than 1%
    http_req_duration: ['p(95)<500'], // 95% of requests should be below 500ms
  },
};

const BASE_URL = 'http://localhost:3000';
const endpoints = ['/users', '/users/123', '/products?category=electronics'];

export default function () {
  const endpoint = randomItem(endpoints);
  const res = http.get(`${BASE_URL}${endpoint}`);

  check(res, {
    'status is 200': (r) => r.status === 200,
  });

  sleep(Math.random() * 2 + 0.5); // Random think time between 0.5s and 2.5s
}

Here, we introduce randomness in both the endpoints hit and the think time between requests, making the load pattern more dynamic.

2. Realistic Data: If your API depends on specific data shapes or sizes, your load tests should reflect that. Fetching 10 users is different from fetching 1000.

import http from 'k6/http';
import { sleep, check } from 'k6';

export const options = {
  vus: 100,
  duration: '1m',
};

export default function () {
  const numUsers = Math.floor(Math.random() * 100) + 1; // Random number of users between 1 and 100
  http.get(`http://localhost:3000/users?limit=${numUsers}`);
  sleep(1);
}

This simulates varying page sizes or list fetches.

3. Realistic Throughput (RPS): Often, you don’t want to just ramp up VUs. You want to test if your API can sustain a specific number of requests per second (RPS).

import http from 'k6/http';
import { sleep } from 'k6';

export const options = {
  // Instead of VUs, we target a specific RPS
  // This requires a bit more dynamic control or a different k6 approach
  // For simplicity in this example, we'll stick to VUs but acknowledge RPS as a target.
  // A common pattern is to find the VUs needed to hit a target RPS.
  vus: 100, // Start with a number and adjust based on observed RPS
  duration: '30s',
};

export default function () {
  http.get('http://localhost:3000/users');
  // Sleep duration is key to controlling RPS when VUs are fixed
  // If each VU makes a request and sleeps for 1s, and you have 100 VUs,
  // you're aiming for roughly 100 RPS (ignoring request time).
  // To target exactly 500 RPS with 100 VUs, each VU must complete a cycle
  // (request + sleep) in 100ms (100 VUs * 100ms/VU = 10s total, so 100 RPS)
  // This implies sleep(0.1 - request_time).
  // For precise RPS, you'd often use k6's scenarios and target rates.
  sleep(1);
}

The scenarios option in k6 is where you truly define complex load profiles, including targeting specific RPS.

import http from 'k6/http';
import { sleep } from 'k6';

export const options = {
  scenarios: {
    constant_request_rate: {
      executor: 'constant-arrival-rate',
      rate: 100, // Target 100 requests per second
      timeUnit: '1s', // per second
      duration: '1m',
      preAllocatedVUs: 50, // Allocate VUs beforehand
      maxVUs: 100, // Max VUs to allow scaling
    },
  },
};

export default function () {
  http.get('http://localhost:3000/users');
  // With constant-arrival-rate, k6 manages the VUs and sleep to hit the rate.
  // You don't typically add sleep() here unless you want to model *additional* think time.
}

This constant-arrival-rate executor is the most direct way to simulate a steady stream of requests hitting your API, regardless of how long individual requests take.

4. Realistic Error Handling and Edge Cases: What happens when your API returns errors? Or when data is missing? Your load tests should not only check for success but also simulate scenarios that might trigger these conditions.

import http from 'k6/http';
import { sleep, check } from 'k6';

export const options = {
  vus: 50,
  duration: '1m',
};

export default function () {
  const userId = Math.random() < 0.1 ? 9999 : Math.floor(Math.random() * 100) + 1; // 10% chance of non-existent user
  const res = http.get(`http://localhost:3000/users/${userId}`);

  check(res, {
    'status is 200 or 404': (r) => r.status === 200 || r.status === 404,
  });

  sleep(1);
}

This script tests both successful user fetches and cases where a user might not be found (simulating a 404).

The levers you control most directly are the options object in your k6 script: vus, duration, stages (for ramp-up/down), and scenarios (for more complex rate/arrival patterns). The script logic itself, using http, sleep, check, and random functions, defines the "shape" of the load.

One of the most powerful, yet often overlooked, aspects of k6 for simulating "real" load is its ability to leverage external data. Instead of hardcoding random numbers or endpoints, you can read data from files (like CSV or JSON) to drive your requests. This allows you to simulate realistic user sessions, API keys, or data payloads that are far more representative of production traffic. For instance, you might have a file containing thousands of valid user IDs and their associated actions, and your k6 script would iterate through this file, making requests that mimic an actual user’s journey through your application.

The next concept you’ll likely grapple with is distributed load testing, where you need to simulate load from multiple machines to overcome the limitations of a single test runner.