Continuous performance testing in production isn’t about finding bugs; it’s about understanding your system’s actual behavior under real-world load, not just simulated stress.
Let’s watch k6 do its thing. Imagine we have a simple API that increments a counter.
// test.js
import http from 'k6/http';
import { sleep } from 'k6';
export const options = {
// This is key: we're not running a massive load test here.
// We're running a small, constant load *during* normal operation.
vus: 10, // 10 virtual users
duration: '1m', // for 1 minute
// We'll schedule this to run every 5 minutes.
};
export default function () {
http.get('http://your-api-host.com/increment');
sleep(1); // Be nice to the system under test
}
Now, imagine this test.js is being executed by a k6 instance running on a small EC2 instance, scheduled via cron to run every 5 minutes. Simultaneously, your actual users are hitting your-api-host.com.
The k6 script fires up, spins up 10 VUs, and each VU makes one request to /increment every second for a minute. It does this while your real traffic is flowing. The k6 results – latency, error rates, throughput – are sent to a time-series database like Prometheus, and visualized on a Grafana dashboard alongside your application’s own metrics.
This isn’t about breaking your system. It’s about measuring its current performance characteristics. If the average latency for the /increment endpoint, as reported by k6, suddenly jumps from 50ms to 200ms during a period of normal user traffic, you know something changed. It could be a recent code deployment, a configuration drift on a dependency, or even a subtle change in the underlying infrastructure.
The problem this solves is the gap between your staging environment and production. Staging is never production. It doesn’t have the same network conditions, the same data volume, the same unpredictable user behavior, or the same noisy neighbors on shared infrastructure. Performance tests run in staging can give you a false sense of security. Continuous performance testing in production provides a constant, low-noise signal of your system’s real-world health.
Internally, k6 is designed for this. Its lightweight VUs and efficient HTTP engine mean you can run a significant number of tests without overwhelming the system you’re testing. The key is the options object: vus and duration are set to modest values. You’re not trying to simulate peak load; you’re trying to establish a consistent, low-impact baseline. The sleep command is crucial for ensuring your test doesn’t inadvertently become the primary load source. You’re a canary, not a battering ram.
The exact levers you control are the vus and duration parameters in your options. A higher vus count increases the load, but also the potential impact. A longer duration gives more data points but consumes more resources. The frequency of execution, managed by your scheduler (cron, Kubernetes CronJob, etc.), determines how quickly you detect deviations. You also tune the target endpoint: focus on critical paths and resource-intensive operations.
What most people don’t realize is that the k6 output itself contains a wealth of information beyond just average latency. By examining the distribution of response times (e.g., using p95, p99 percentiles in your dashboarding) and the specific error codes returned by your application during these low-load tests, you can pinpoint subtle regressions or performance degradations that a simple average might mask. A slight increase in 5xx errors under consistent, low k6 load, even if the average latency is acceptable, is a strong indicator of an underlying issue.
The next concept you’ll run into is defining robust alerting rules based on these continuous performance metrics.