GitLab CI can deploy to Kubernetes using kubectl and Helm, but the most surprising thing is how often the CI job fails not because of Kubernetes or Helm, but because the GitLab Runner itself doesn’t have the right network access or permissions.

Let’s say you’ve got a GitLab CI pipeline that looks something like this:

stages:
  - deploy

deploy_to_staging:
  stage: deploy
  image: dtzar/helm-kubectl:latest # A common image with kubectl and helm
  script:
    - echo "Configuring kubectl..."
    - mkdir -p ~/.kube
    - echo "$KUBE_CONFIG" | base64 -d > ~/.kube/config # Assuming KUBE_CONFIG is a CI/CD variable
    - kubectl config use-context my-staging-cluster
    - echo "Installing/Upgrading Helm chart..."
    - helm upgrade --install my-app ./charts/my-app --namespace staging --values ./deploy/staging.yaml
  environment:
    name: staging
    url: http://my-app.staging.example.com

When this job runs, it’s supposed to authenticate to your Kubernetes cluster using a KUBE_CONFIG variable, and then deploy or update your application using Helm. The helm upgrade --install command is the workhorse here. It checks if my-app already exists in the staging namespace. If it does, it upgrades it; if not, it installs it. The --values ./deploy/staging.yaml part tells Helm which configuration specific to the staging environment to use.

The problem often isn’t with the Helm command itself, or even the Kubernetes manifest files within your chart. It’s usually that the GitLab Runner executing this script can’t reach your Kubernetes API server. The KUBE_CONFIG might be perfectly valid, but if the Runner is behind a firewall that blocks outbound connections to your-k8s-api.example.com:6443, kubectl and Helm will just time out, or return connection refused errors.

Here’s a breakdown of how it actually works under the hood, and what you can do when it breaks:

1. The KUBE_CONFIG Variable: This is a base64 encoded string of your Kubernetes configuration file. It contains the cluster endpoint, user credentials (like certificates or tokens), and context information. When you run echo "$KUBE_CONFIG" | base64 -d > ~/.kube/config, you’re essentially creating a temporary kubeconfig file on the GitLab Runner for that specific job. kubectl and Helm then use this file to find and authenticate with your cluster.

2. kubectl config use-context my-staging-cluster: This command tells kubectl and Helm which set of credentials and cluster endpoint to use from the kubeconfig file. If your kubeconfig has multiple contexts, this ensures you’re talking to the right cluster.

3. helm upgrade --install my-app ./charts/my-app --namespace staging --values ./deploy/staging.yaml: This is where the magic (or the failure) happens. Helm parses your chart, merges it with the values from staging.yaml, and then translates this into Kubernetes API calls. It needs to talk to the Kubernetes API server to create or update Deployments, Services, ConfigMaps, etc.

Common Failure Points and Fixes:

  • Network Egress from GitLab Runner: This is the most common culprit.

    • Diagnosis: From within a running GitLab Runner pod (if using Kubernetes executor) or on the physical/VM runner, try curl -v https://your-k8s-api.example.com:6443. You’ll likely see "Connection refused" or a timeout.
    • Fix: Ensure your GitLab Runner’s network egress rules allow it to connect to your Kubernetes API server’s IP address and port (usually 6443). If using a Kubernetes executor, this means configuring your Kubernetes network policies or firewall rules for the runner namespace. For VM/bare-metal runners, it’s your host’s firewall or cloud provider’s security groups.
    • Why it works: kubectl and Helm communicate with the Kubernetes API server over HTTPS. Without network access, these requests can’t even reach the server.
  • Incorrect KUBE_CONFIG: The variable might be malformed, truncated, or contain expired credentials.

    • Diagnosis: In your CI job logs, look for errors like Error: unknown flag: --kubeconfig or messages indicating authentication failures from kubectl or Helm. You can also try kubectl --kubeconfig ~/.kube/config get pods directly in the CI script to test the config file.
    • Fix: Regenerate your KUBE_CONFIG variable in GitLab’s CI/CD settings. Ensure you’re copying the entire file content, including the --- at the beginning and end. For token-based auth, check that the token hasn’t expired.
    • Why it works: A valid kubeconfig is the key to authentication. If it’s invalid, the API server will reject your requests.
  • RBAC Permissions: The service account configured in your KUBE_CONFIG (or the user if using direct user auth) might not have sufficient permissions in the target Kubernetes namespace.

    • Diagnosis: You’ll see errors like Error from server (Forbidden): deployments.apps is forbidden: User "system:serviceaccount:gitlab-runner:default" (or similar) cannot list resource "deployments" in API group "apps" in the namespace "staging".
    • Fix: Create or update a Role and RoleBinding (or ClusterRole and ClusterRoleBinding) in Kubernetes that grants the necessary permissions. For example, to allow Helm to manage all resources in the staging namespace, you might need create, get, list, watch, update, patch, and delete permissions for most resource types.
    • Why it works: Kubernetes uses Role-Based Access Control (RBAC) to enforce authorization. Helm needs these permissions to create, update, and delete Kubernetes objects on your behalf.
  • Helm Tiller (for Helm v2): If you are still on Helm v2, the Helm Tiller server component needs to be running in your cluster and the KUBE_CONFIG must grant Tiller the necessary permissions.

    • Diagnosis: Errors like Error: Tiller is not available or connection refused when Helm tries to connect to Tiller.
    • Fix: Install Tiller in your cluster (helm init --tiller-image <tiller-image>) and ensure the service account Tiller uses has the correct RBAC permissions (often cluster-admin, which is highly discouraged for security reasons and why Helm v3 removed Tiller).
    • Why it works: Helm v2 requires a server-side component (Tiller) running in the cluster to manage releases.
  • Incorrect Kubernetes Context: The kubectl config use-context my-staging-cluster command might be pointing to the wrong cluster or using incorrect credentials.

    • Diagnosis: kubectl commands might succeed but operate on the wrong cluster, or kubectl get nodes might return an empty list or an error indicating the cluster is unreachable.
    • Fix: Verify the context name in your KUBE_CONFIG and ensure it matches the one specified in the use-context command. You can list available contexts with kubectl config get-contexts.
    • Why it works: The context is the bridge between your local kubectl/Helm client and the specific Kubernetes cluster you intend to interact with.
  • GitLab Runner Configuration: If using the Kubernetes executor for GitLab Runners, the runner’s service account might lack permissions to create/manage pods in the namespace where the CI job is running, or it might not be able to pull the Docker image.

    • Diagnosis: The CI job might fail to even start, or it might fail early with errors related to Kubernetes API calls made by the runner controller itself, not your script.
    • Fix: Ensure the gitlab-runner’s service account in your Kubernetes cluster has the necessary RBAC permissions to create pods in the target namespace, and that it can pull the dtzar/helm-kubectl:latest image from its registry.
    • Why it works: The runner itself needs to provision and manage the CI job’s execution environment within Kubernetes before your script even runs.

Once all these are sorted, the next error you’ll likely encounter is a Helm-specific error related to a syntax issue in your chart’s values.yaml files or a Kubernetes resource definition that’s invalid for your cluster’s API version.

Want structured learning?

Take the full Gitlab-ci course →