The P95 and P99 percentile metrics in k6 aren’t just averages; they’re critical for understanding the worst-case user experience during load testing, telling you how the slowest 5% or 1% of your requests performed.
Let’s see k6 in action. Imagine we’re testing a simple API endpoint that sometimes takes a while to respond.
import http from 'k6/http';
import { sleep } from 'k6';
export const options = {
vus: 10,
duration: '30s',
thresholds: {
// We'll define thresholds later, but this is where they go.
},
};
export default function () {
const res = http.get('http://your-api-endpoint.com/data');
// Simulate some variability in response time
sleep(Math.random() * 2);
}
When you run this script, k6 will collect response times for each request. After the test, it reports metrics like http_req_duration. By default, k6 calculates several percentiles for this metric: p(vu_init), p(null), p(90), p(95), p(99), p(99.9), p(100). The p(null) is the average, but p(95) and p(99) are what we’re interested in for this analysis.
p(95) means that 95% of your requests completed within this time, and 5% took longer. p(99) means 99% completed within this time, and 1% took longer. These are crucial because a single slow request can drastically skew an average, making the system look fine when many users are actually experiencing significant delays.
For example, if your average response time is 50ms, but your P95 is 500ms, it means that while most users are getting fast responses, a significant chunk (5%) are waiting half a second or more. This is the difference between a smooth user experience and one riddled with frustrating lag.
This is why setting thresholds on these percentiles is so important. You can define them in the options object:
export const options = {
vus: 10,
duration: '30s',
thresholds: {
'http_req_duration{group::/data}': ['p(95)<500', 'p(99)<750'], // Example for a specific request group
'http_req_duration': ['p(95)<500', 'p(99)<750'], // Overall for all requests
},
};
Here, we’re saying that for the /data endpoint (or all requests if the group:: part is omitted), we expect at least 95% of requests to finish within 500ms, and at least 99% to finish within 750ms. If any of these conditions are not met, the k6 test run will fail.
The mental model here is about understanding the distribution of your application’s performance, not just its central tendency. Percentiles, especially the higher ones, give you a window into the tail of that distribution – the outliers, the slowpokes, the requests that could be causing users to abandon your service. k6 calculates these by sorting all observed response times and picking the value at the specified percentage point.
One common misconception is that if your average is good, you’re fine. But consider this: if you have 100 requests, and 99 take 10ms, but one takes 9910ms, your average is 109ms. Your P99, however, is 9910ms. The average tells a story of speed, but the P99 tells a story of extreme frustration for one user. This is why P95 and P99 are the go-to metrics for performance-critical applications where even a small percentage of slow responses can have a outsized negative impact on user satisfaction and conversion rates.
Understanding these percentiles allows you to tune your application for consistency, not just peak performance. The next step is often correlating these response time percentiles with other metrics, like error rate percentiles, to pinpoint exactly why those slowest requests are happening.