k6 Soak Test: Run Hours-Long Stability Tests (2026)

Soak tests aren’t about how much load you can throw at a system, but how long it can handle a sustained load without degrading.

Let’s watch a k6 soak test in action. Imagine we have a simple API that serves user profiles. We’ll run a k6 script that hits this API repeatedly for an extended period, simulating real-world, continuous usage.

import http from 'k6/http';
import { sleep } from 'k6';

export let options = {
  stages: [
    { duration: '1m', target: 100 }, // Ramp up to 100 users over 1 minute
    { duration: '1h', target: 100 }, // Stay at 100 users for 1 hour
    { duration: '1m', target: 0 },  // Ramp down to 0 users over 1 minute
  ],
  thresholds: {
    http_req_failed: 'rate<0.01', // 99% of requests must succeed
    http_req_duration: 'p(95)<500', // 95% of requests must be below 500ms
  },
};

export default function () {
  // Simulate fetching a user profile
  http.get('http://localhost:3000/users/123');
  sleep(1); // Wait 1 second between requests for this virtual user
}

We’d run this with k6 run --duration 1h ./soak-test.js. The stages define the user load profile over time. Here, we ramp up to 100 concurrent virtual users, keep them there for a full hour, and then ramp down. The thresholds define acceptable performance levels. If these aren’t met, k6 will report an error.

The primary goal of a soak test is to uncover issues that only manifest over time: memory leaks, resource exhaustion, connection pool depletion, or subtle race conditions that occur under constant, prolonged stress. It’s not about finding the breaking point, but the point where things start to creak.

Internally, k6 simulates thousands of virtual users concurrently. Each virtual user executes the script’s default function, making HTTP requests and sleeping. The stages array orchestrates the number of active virtual users over the test duration. k6 collects metrics like request duration, failure rate, and data transfer for every request made by every virtual user.

The key levers you control are the stages (duration and target VUs) and thresholds. You decide how long the test runs, what the sustained load looks like, and what performance is considered acceptable. For a soak test, the duration in the stages is paramount, often set to hours or even days, with a stable target VU count.

The most surprising thing about long-duration tests is how often seemingly unrelated system components start to impact performance. A database connection pool that’s perfectly fine under a short, sharp load might become a bottleneck after hours of steady, low-level churn because of how connections are acquired and released, or how stale connections are handled. Similarly, application-level caches can behave differently with prolonged, steady hit rates compared to bursty traffic.

Most people focus on peak load for performance testing, but soak tests reveal the hidden costs of sustained operation. The next step is often observing the system’s resource utilization (CPU, memory, network, disk I/O) on the application servers, databases, and any other critical infrastructure during the test to correlate performance degradation with specific resource constraints.