The GitLab CI runner on your shared runner lost its connection to the GitLab API, preventing it from reporting job status or retrieving new job instructions.
Here’s what’s likely happening and how to fix it:
1. Runner Registration Token Expired or Invalid
- Diagnosis: Check the runner’s configuration file (
/etc/gitlab-runner/config.tomlon Linux, or theconfig.tomlfile specified by theGITLAB_RUNNER_HOMEenvironment variable). Look for thetokenvalue. Then, navigate to your GitLab project or group settings -> CI/CD -> Runners. If the runner is listed, check its status and registration token. If it’s not listed, or the token seems old, it needs re-registration. - Fix: Generate a new registration token from your GitLab project or group settings (CI/CD -> Runners -> "New runner"). Then, stop the runner service (
sudo gitlab-runner stop), unregister the old runner (sudo gitlab-runner unregister --url <your_gitlab_url> --token <old_token>), and re-register it with the new token (sudo gitlab-runner register --url <your_gitlab_url> --registration-token <new_token> --description "My Runner"). Start the runner service (sudo gitlab-runner start). - Why it works: The registration token is how the runner authenticates with the GitLab API. An expired or incorrect token means the runner can’t establish a valid session. Re-registering with a fresh token re-authenticates the runner.
2. Network Connectivity Issues to GitLab API
- Diagnosis: From the machine where the GitLab runner is installed, try to
curlyour GitLab instance’s API endpoint. For example, if your GitLab instance isgitlab.example.com, runcurl -v https://gitlab.example.com/api/v4/runners. Look for connection refused, timeout, or SSL certificate errors. - Fix:
- Firewall: Ensure no firewalls (local or network) are blocking outbound connections from the runner machine to your GitLab instance’s IP address and HTTPS port (usually 443).
- DNS: Verify DNS resolution for your GitLab instance from the runner machine. Run
dig gitlab.example.com. If it doesn’t resolve correctly, fix your DNS settings. - Proxy: If your network requires a proxy for outbound HTTPS traffic, ensure the runner is configured to use it. Edit
/etc/gitlab-runner/config.tomland add[runners.docker.helper_image_opts]or[runners.shell.env]to includeHTTP_PROXYandHTTPS_PROXYenvironment variables. For example:
For shell executor, add to[[runners]] name = "Docker Runner" url = "https://gitlab.example.com/" token = "YOUR_RUNNER_TOKEN" executor = "docker" [runners.docker] tls_verify = false image = "docker:latest" privileged = true disable_cache = false volumes = ["/cache"] shm_size = 0 helper_image = "gitlab/gitlab-runner-helper:alpine-x86_64-latest" [runners.docker.services] image = "docker:dind" entrypoint = ["/usr/bin/docker", "dockerd-entrypoint.sh"] command = ["--storage-driver=overlay2"] [runners.docker.cache] Type = "s3" Shared = true [runners.docker.cache.s3] BucketName = "gitlab-runner-cache" BucketLocation = "us-east-1"[[runners]]:[[runners]] name = "Shell Runner" url = "https://gitlab.example.com/" token = "YOUR_RUNNER_TOKEN" executor = "shell" [runners.shell.env] HTTP_PROXY = "http://proxy.example.com:8080" HTTPS_PROXY = "http://proxy.example.com:8080"
- Why it works: The runner needs to communicate with the GitLab API to fetch jobs, send logs, and update statuses. Any interruption in this communication, whether due to network policy, name resolution, or proxy misconfiguration, will cause jobs to appear stuck or fail.
3. GitLab API Server Issues
- Diagnosis: Check the status page for your GitLab instance (
https://gitlab.example.com/admin/healthfor self-hosted, or the official GitLab status page for GitLab.com). Look for any reported incidents or performance degradation affecting the API or CI services. - Fix: If there are known issues with your GitLab instance, the fix is to resolve those underlying problems. This might involve restarting GitLab services (
sudo gitlab-ctl restart), increasing server resources, or waiting for the GitLab team to resolve an incident. - Why it works: If the GitLab API itself is down or overloaded, the runner has no endpoint to communicate with, leading to failures.
4. Runner Resource Exhaustion (CPU/Memory/Disk)
- Diagnosis: On the runner machine, use tools like
top,htop,free -m, anddf -hto check CPU usage, available RAM, and free disk space. If the runner machine is consistently maxed out on resources, it might not be able to process API requests or start new jobs. - Fix:
- Increase Resources: Allocate more CPU, RAM, or disk space to the runner machine.
- Optimize Jobs: Review your
.gitlab-ci.ymlto identify jobs that are excessively resource-intensive. Optimize scripts, reduce build artifact sizes, or use caching more effectively. - Runner Configuration: For Docker executors, consider limiting container resources using
cpusandmemorysettings inconfig.toml.[[runners]] executor = "docker" [runners.docker] cpus = 2 memory = "4g"
- Why it works: A runner that is starved for resources may become unresponsive. It can’t allocate memory to process API responses, spin up new job environments, or write logs, all of which can manifest as connection or pipeline failures.
5. Incorrect Runner Configuration in config.toml
- Diagnosis: Carefully review the
/etc/gitlab-runner/config.tomlfile for syntax errors, incorrect URLs, or invalid executor settings. An invalidurlortokenwill prevent authentication. Incorrect executor settings (e.g., missing Docker daemon configuration for a Docker executor) will prevent job execution. - Fix: Correct any typos, ensure the
urlpoints to your GitLab instance’s root (e.g.,https://gitlab.example.com/), and verify that thetokenis valid. If using Docker executor, ensure[runners.docker]section is correctly configured. After editing, restart the runner:sudo gitlab-runner restart. - Why it works: The
config.tomlfile is the runner’s primary configuration source. Any malformed entries will prevent the runner from operating correctly, leading to communication or execution failures.
6. TLS/SSL Certificate Issues
- Diagnosis: If your GitLab instance uses a self-signed certificate or a certificate from a private CA, the runner might not trust it. The
curl -vcommand mentioned earlier will show SSL handshake errors. In the runner logs (sudo gitlab-runner statusthensudo journalctl -u gitlab-runner -f), you might seex509: certificate signed by unknown authorityor similar TLS errors. - Fix:
- Option A (Recommended): Add your GitLab instance’s CA certificate to the runner machine’s trusted certificate store. On Linux, this typically involves placing the
.crtfile in/etc/gitlab-runner/certs/and ensuring the runner process has read permissions. You might need to configure the runner to use this custom CA. - Option B (Less Secure): Disable TLS verification for the runner. Edit
/etc/gitlab-runner/config.tomland settls_verify = falsewithin the[[runners]]section for the relevant runner. Only do this if you understand the security implications and are on a trusted network.[[runners]] name = "My Runner" url = "https://gitlab.example.com/" token = "YOUR_RUNNER_TOKEN" executor = "docker" [runners.docker] tls_verify = false # <-- Set to false
sudo gitlab-runner restart. - Option A (Recommended): Add your GitLab instance’s CA certificate to the runner machine’s trusted certificate store. On Linux, this typically involves placing the
- Why it works: Secure communication (HTTPS) between the runner and the GitLab API relies on valid TLS certificates. If the runner cannot verify the identity of the GitLab server due to an untrusted certificate, it will refuse to connect.
The next error you’ll likely see if all these are fixed is a Build cancelled or a timeout error if the job itself has no work to do or is stuck in a loop.