GitLab CI can deploy to Kubernetes using kubectl and Helm, but the most surprising thing is how often the CI job fails not because of Kubernetes or Helm, but because the GitLab Runner itself doesn’t have the right network access or permissions.
Let’s say you’ve got a GitLab CI pipeline that looks something like this:
stages:
- deploy
deploy_to_staging:
stage: deploy
image: dtzar/helm-kubectl:latest # A common image with kubectl and helm
script:
- echo "Configuring kubectl..."
- mkdir -p ~/.kube
- echo "$KUBE_CONFIG" | base64 -d > ~/.kube/config # Assuming KUBE_CONFIG is a CI/CD variable
- kubectl config use-context my-staging-cluster
- echo "Installing/Upgrading Helm chart..."
- helm upgrade --install my-app ./charts/my-app --namespace staging --values ./deploy/staging.yaml
environment:
name: staging
url: http://my-app.staging.example.com
When this job runs, it’s supposed to authenticate to your Kubernetes cluster using a KUBE_CONFIG variable, and then deploy or update your application using Helm. The helm upgrade --install command is the workhorse here. It checks if my-app already exists in the staging namespace. If it does, it upgrades it; if not, it installs it. The --values ./deploy/staging.yaml part tells Helm which configuration specific to the staging environment to use.
The problem often isn’t with the Helm command itself, or even the Kubernetes manifest files within your chart. It’s usually that the GitLab Runner executing this script can’t reach your Kubernetes API server. The KUBE_CONFIG might be perfectly valid, but if the Runner is behind a firewall that blocks outbound connections to your-k8s-api.example.com:6443, kubectl and Helm will just time out, or return connection refused errors.
Here’s a breakdown of how it actually works under the hood, and what you can do when it breaks:
1. The KUBE_CONFIG Variable:
This is a base64 encoded string of your Kubernetes configuration file. It contains the cluster endpoint, user credentials (like certificates or tokens), and context information. When you run echo "$KUBE_CONFIG" | base64 -d > ~/.kube/config, you’re essentially creating a temporary kubeconfig file on the GitLab Runner for that specific job. kubectl and Helm then use this file to find and authenticate with your cluster.
2. kubectl config use-context my-staging-cluster:
This command tells kubectl and Helm which set of credentials and cluster endpoint to use from the kubeconfig file. If your kubeconfig has multiple contexts, this ensures you’re talking to the right cluster.
3. helm upgrade --install my-app ./charts/my-app --namespace staging --values ./deploy/staging.yaml:
This is where the magic (or the failure) happens. Helm parses your chart, merges it with the values from staging.yaml, and then translates this into Kubernetes API calls. It needs to talk to the Kubernetes API server to create or update Deployments, Services, ConfigMaps, etc.
Common Failure Points and Fixes:
-
Network Egress from GitLab Runner: This is the most common culprit.
- Diagnosis: From within a running GitLab Runner pod (if using Kubernetes executor) or on the physical/VM runner, try
curl -v https://your-k8s-api.example.com:6443. You’ll likely see "Connection refused" or a timeout. - Fix: Ensure your GitLab Runner’s network egress rules allow it to connect to your Kubernetes API server’s IP address and port (usually 6443). If using a Kubernetes executor, this means configuring your Kubernetes network policies or firewall rules for the runner namespace. For VM/bare-metal runners, it’s your host’s firewall or cloud provider’s security groups.
- Why it works:
kubectland Helm communicate with the Kubernetes API server over HTTPS. Without network access, these requests can’t even reach the server.
- Diagnosis: From within a running GitLab Runner pod (if using Kubernetes executor) or on the physical/VM runner, try
-
Incorrect
KUBE_CONFIG: The variable might be malformed, truncated, or contain expired credentials.- Diagnosis: In your CI job logs, look for errors like
Error: unknown flag: --kubeconfigor messages indicating authentication failures fromkubectlor Helm. You can also trykubectl --kubeconfig ~/.kube/config get podsdirectly in the CI script to test the config file. - Fix: Regenerate your
KUBE_CONFIGvariable in GitLab’s CI/CD settings. Ensure you’re copying the entire file content, including the---at the beginning and end. For token-based auth, check that the token hasn’t expired. - Why it works: A valid kubeconfig is the key to authentication. If it’s invalid, the API server will reject your requests.
- Diagnosis: In your CI job logs, look for errors like
-
RBAC Permissions: The service account configured in your
KUBE_CONFIG(or the user if using direct user auth) might not have sufficient permissions in the target Kubernetes namespace.- Diagnosis: You’ll see errors like
Error from server (Forbidden): deployments.apps is forbidden: User "system:serviceaccount:gitlab-runner:default" (or similar) cannot list resource "deployments" in API group "apps" in the namespace "staging". - Fix: Create or update a
RoleandRoleBinding(orClusterRoleandClusterRoleBinding) in Kubernetes that grants the necessary permissions. For example, to allow Helm to manage all resources in thestagingnamespace, you might needcreate,get,list,watch,update,patch, anddeletepermissions for most resource types. - Why it works: Kubernetes uses Role-Based Access Control (RBAC) to enforce authorization. Helm needs these permissions to create, update, and delete Kubernetes objects on your behalf.
- Diagnosis: You’ll see errors like
-
Helm Tiller (for Helm v2): If you are still on Helm v2, the Helm Tiller server component needs to be running in your cluster and the
KUBE_CONFIGmust grant Tiller the necessary permissions.- Diagnosis: Errors like
Error: Tiller is not availableorconnection refusedwhen Helm tries to connect to Tiller. - Fix: Install Tiller in your cluster (
helm init --tiller-image <tiller-image>) and ensure the service account Tiller uses has the correct RBAC permissions (often cluster-admin, which is highly discouraged for security reasons and why Helm v3 removed Tiller). - Why it works: Helm v2 requires a server-side component (Tiller) running in the cluster to manage releases.
- Diagnosis: Errors like
-
Incorrect Kubernetes Context: The
kubectl config use-context my-staging-clustercommand might be pointing to the wrong cluster or using incorrect credentials.- Diagnosis:
kubectlcommands might succeed but operate on the wrong cluster, orkubectl get nodesmight return an empty list or an error indicating the cluster is unreachable. - Fix: Verify the context name in your
KUBE_CONFIGand ensure it matches the one specified in theuse-contextcommand. You can list available contexts withkubectl config get-contexts. - Why it works: The context is the bridge between your local
kubectl/Helm client and the specific Kubernetes cluster you intend to interact with.
- Diagnosis:
-
GitLab Runner Configuration: If using the Kubernetes executor for GitLab Runners, the runner’s service account might lack permissions to create/manage pods in the namespace where the CI job is running, or it might not be able to pull the Docker image.
- Diagnosis: The CI job might fail to even start, or it might fail early with errors related to Kubernetes API calls made by the runner controller itself, not your script.
- Fix: Ensure the
gitlab-runner’s service account in your Kubernetes cluster has the necessary RBAC permissions to create pods in the target namespace, and that it can pull thedtzar/helm-kubectl:latestimage from its registry. - Why it works: The runner itself needs to provision and manage the CI job’s execution environment within Kubernetes before your script even runs.
Once all these are sorted, the next error you’ll likely encounter is a Helm-specific error related to a syntax issue in your chart’s values.yaml files or a Kubernetes resource definition that’s invalid for your cluster’s API version.