GitLab CI’s k6 integration can prevent performance regressions from reaching production by automatically failing pipelines when key metrics degrade.

Here’s a live example of a k6 test and its integration into a GitLab CI job:

# .gitlab-ci.yml
performance_tests:
  stage: test
  image: k6/k6:latest
  script:
    - k6 run --out json=results.json -e TARGET_URL=$CI_ENVIRONMENT_URL /tests/basic.js
  artifacts:
    paths:
      - results.json
    expire_in: 1 week
  rules:
    - if: '$CI_COMMIT_BRANCH == "main" || $CI_COMMIT_BRANCH == "staging"'
      when: on_success
// tests/basic.js
import http from 'k6/http';
import { sleep } from 'k6';

export const options = {
  vus: 10, // Number of virtual users
  duration: '30s', // Duration of the test
  thresholds: {
    'http_req_failed': 'rate<0.01', // http errors should be less than 1%
    'http_req_duration': 'p(95)<250', // 95% of requests should be below 250ms
  },
};

export default function () {
  http.get('http://example.com'); // Replace with your application's URL
  sleep(1);
}

This setup runs a k6 test against your application’s URL (dynamically set by $CI_ENVIRONMENT_URL in GitLab CI) and saves the results. The thresholds in the k6 script are crucial: they define acceptable performance limits. If these limits are breached, k6 exits with a non-zero status code, causing the GitLab CI job to fail.

The mental model for this is straightforward: you define what "good performance" looks like as a series of measurable thresholds. Your CI pipeline then acts as an automated guardian, executing these tests on every relevant commit. If the application’s performance dips below the defined acceptable levels, the pipeline halts, preventing the regressed code from being deployed.

The options object in the k6 script is where you configure the test execution. vus controls the concurrency (how many users are hitting your app simultaneously), and duration sets how long the test runs. The thresholds section is where you encode your performance Service Level Objectives (SLOs). These are not just arbitrary numbers; they should reflect user experience and business requirements. For instance, http_req_failed: 'rate<0.01' means you’re willing to tolerate at most a 1% error rate. http_req_duration: 'p(95)<250' means that 95% of your requests must complete within 250 milliseconds.

To make this truly robust, you’ll want to integrate k6 with a results analysis tool or a dedicated performance testing platform. GitLab’s built-in features are good for basic pass/fail, but for trending and deeper analysis, you might push results to Grafana or a service like Grafana Cloud, or use k6 Cloud. This allows you to track performance over time, identify gradual degradations, and have more context when a test fails.

The most surprising aspect for many users is how little load is actually needed to detect regressions. You don’t need to simulate peak traffic to see if a change made a page 500ms slower. A small, consistent load is often sufficient to highlight the impact of code changes on response times or error rates. The key is consistency in your test environment and load profile.

The next step is to start analyzing trends and setting more granular thresholds for specific API endpoints rather than just a global http_req_duration.

Want structured learning?

Take the full K6 course →