The GitLab CI Runner system failed because the gitlab-runner service, which is responsible for picking up and executing CI jobs, could not connect to the GitLab API. This means no jobs were being picked up, and existing jobs might have timed out.
Common Causes and Fixes
-
Runner Not Registered or Authentication Token Expired
- Diagnosis: Check the runner’s status in your GitLab project/group settings under "CI/CD" -> "Runners." If it shows as disconnected or has a red dot, the registration might be invalid. You can also check the runner’s configuration file (
/etc/gitlab-runner/config.toml) for thetokenfield. - Fix: Re-register the runner. On the runner machine, execute:
Replacesudo gitlab-runner register --url "https://your.gitlab.instance.com/" --registration-token "YOUR_REGISTRATION_TOKEN" --description "my-new-runner" --tag-list "docker,aws" --executor "docker" --docker-image "alpine:latest""https://your.gitlab.instance.com/"with your GitLab instance URL,"YOUR_REGISTRATION_TOKEN"with a fresh token obtained from your GitLab project/group’s CI/CD settings, and adjust description, tags, and executor as needed. This command re-establishes the secure link between the runner and GitLab using the latest credentials. - Why it works: The registration token is a one-time-use credential. If it expires or is revoked, the runner loses its authorization to communicate with the GitLab API. Re-registration provides a new, valid token.
- Diagnosis: Check the runner’s status in your GitLab project/group settings under "CI/CD" -> "Runners." If it shows as disconnected or has a red dot, the registration might be invalid. You can also check the runner’s configuration file (
-
Network Connectivity Issues
- Diagnosis: From the runner machine, try to
curlyour GitLab instance:
Look for connection timeouts or SSL certificate errors. Also, check firewall rules on the runner machine and any network firewalls between the runner and GitLab.curl -v https://your.gitlab.instance.com/api/v4/runners - Fix: If
curlfails, ensure the runner machine can resolve and reach your GitLab instance’s hostname and port (usually 443 for HTTPS). If using a corporate network, you might need to configure a proxy in the runner’sconfig.toml:
Restart the[[runners]] url = "https://your.gitlab.instance.com/" token = "..." executor = "docker" [runners.docker] tls_verify = false image = "alpine:latest" privileged = true disable_cache = false volumes = ["/cache"] [runners.cache] [runners.cache.s3] [runners.cache.gcs] [runners.cache.azure] # Add this section if you need a proxy # http_proxy = "http://your.proxy.server:8080" # https_proxy = "http://your.proxy.server:8080"gitlab-runnerservice after making changes. - Why it works: The runner needs a direct, unhindered network path to the GitLab API to poll for jobs and report status. Proxy settings ensure traffic is routed correctly through intermediary network devices.
- Diagnosis: From the runner machine, try to
-
Incorrect GitLab Instance URL in
config.toml- Diagnosis: Examine the
config.tomlfile (/etc/gitlab-runner/config.toml) on the runner machine. Look for theurlparameter under the[[runners]]section. - Fix: Ensure the
urlis exactly correct, includinghttps://and the correct domain. For example:
After correcting the URL, restart the[[runners]] url = "https://gitlab.example.com/" # ... other configurationsgitlab-runnerservice:sudo systemctl restart gitlab-runner - Why it works: A mistyped URL means the runner is attempting to connect to a non-existent or incorrect server, preventing any communication.
- Diagnosis: Examine the
-
gitlab-runnerService Not Running or Crashing- Diagnosis: Check the status of the
gitlab-runnerservice:
Look for "active (running)" or "inactive (dead)". If it’s not running or has recently failed, check the logs for errors:sudo systemctl status gitlab-runnersudo journalctl -u gitlab-runner -f - Fix: Start or restart the service:
If it keeps crashing, investigate thesudo systemctl start gitlab-runner sudo systemctl restart gitlab-runnerjournalctloutput for specific errors (e.g., disk full, out of memory, configuration parsing errors) and address those underlying issues. - Why it works: The
gitlab-runnerservice is the daemon that continuously polls GitLab for jobs and manages job execution. If it’s not running, no jobs can be processed.
- Diagnosis: Check the status of the
-
Resource Constraints on the Runner Machine
- Diagnosis: Monitor the CPU, RAM, and disk space on the machine hosting the GitLab Runner. Use commands like
top,htop,free -h, anddf -h. If the runner machine is starved of resources, thegitlab-runnerprocess might become unresponsive or crash. - Fix: Allocate more resources to the runner machine (e.g., increase RAM, CPU, or disk space). If using Docker executors, ensure the Docker daemon itself has sufficient resources and that child processes aren’t being OOM-killed. Free up disk space if it’s full.
- Why it works: The runner process and any spawned job processes (like Docker containers) require system resources to operate. Insufficient resources lead to instability and failures.
- Diagnosis: Monitor the CPU, RAM, and disk space on the machine hosting the GitLab Runner. Use commands like
-
SSL Certificate Issues (Self-Signed or Expired)
- Diagnosis: If your GitLab instance uses a self-signed SSL certificate or its certificate has expired, the runner might fail to connect securely. The
curlcommand from step 2 will likely show SSL errors. Check the runner logs (journalctl -u gitlab-runner -f) for messages like "x509: certificate signed by unknown authority" or "certificate has expired". - Fix:
- Option A (Recommended): Configure your GitLab instance with a valid, trusted SSL certificate (e.g., from Let’s Encrypt).
- Option B (Less Secure): If you must use a self-signed certificate, you need to tell the runner to trust it. Copy the CA certificate (or the self-signed certificate itself if it’s acting as its own CA) to the runner machine, e.g.,
/etc/gitlab-runner/certs/gitlab.example.com.crt. Then, in/etc/gitlab-runner/config.toml, add or modify thetls-ca-filesetting:
Restart the[[runners]] url = "https://your.gitlab.instance.com/" token = "..." tls-ca-file = "/etc/gitlab-runner/certs/gitlab.example.com.crt" executor = "docker" # ... rest of configgitlab-runnerservice. - Option C (Insecure, Not Recommended): For testing or highly controlled environments, you can disable TLS verification entirely, but this is a significant security risk. In
config.toml, under[[runners.docker]](if using Docker executor), settls_verify = false. This is generally not advisable for production.
- Why it works: The runner needs to establish a secure TLS connection to the GitLab API. If it cannot verify the server’s identity due to an untrusted or expired certificate, it will refuse to connect. Providing the correct CA certificate or disabling verification allows the connection to proceed.
- Diagnosis: If your GitLab instance uses a self-signed SSL certificate or its certificate has expired, the runner might fail to connect securely. The
After resolving these common issues, the next error you might encounter is related to specific job execution failures, such as insufficient disk space within a Docker container, missing dependencies, or permission errors in your CI scripts.