GitHub Actions can leverage GPU-accelerated hardware for your CI/CD workflows, but only on specific runner types and with careful configuration.

Here’s how to get those powerful GPUs working for you:

Setting Up GPU Runners

First, you need to select a runner that has GPU hardware attached. GitHub offers gpu labels for certain runner types. You can find these in the GitHub Actions documentation or by inspecting the available runner types in your organization or repository settings.

To target a GPU runner in your workflow, you’ll specify the runs-on field in your job definition. For example:

jobs:
  gpu_job:
    runs-on: [self-hosted, gpu]
    steps:
      - name: Checkout code
        uses: actions/checkout@v4

      - name: Run GPU workload
        run: |
          # Your GPU-accelerated commands here
          echo "Running on a GPU runner!"
          # Example: nvidia-smi to check GPU status
          nvidia-smi

The [self-hosted, gpu] syntax tells Actions to look for a self-hosted runner that also has the gpu label. If you’re using GitHub-hosted runners with GPU capabilities, you might use a label like runs-on: [ubuntu-latest-gpu], depending on what GitHub offers at the time. Always check the official GitHub Actions runner availability.

Installing GPU Drivers and Libraries

The most common pitfall is assuming the GPU drivers and necessary libraries (like CUDA or cuDNN for NVIDIA GPUs) are pre-installed on the runner. They are not. You’ll need to install them as part of your workflow.

For NVIDIA GPUs:

  1. Install NVIDIA Drivers: This is crucial. You can often use pre-built Docker images that include drivers, or install them directly.

    • Diagnosis: Before installing, check if nvidia-smi is available. If it’s not found, drivers are missing.

    • Fix (Example using a Docker image):

      jobs:
        gpu_job:
          runs-on: [self-hosted, gpu]
          container:
            image: nvidia/cuda:11.8.0-base-ubuntu22.04 # Example CUDA version
            options: --gpus all # This is key to exposing GPUs to the container
          steps:
            - name: Checkout code
              uses: actions/checkout@v4
      
            - name: Verify GPU access
              run: nvidia-smi
      

      The options: --gpus all in the container definition is essential for Docker to pass the host’s GPUs into the container.

    • Fix (Example installing drivers on a bare runner): This is more complex and often involves downloading and running NVIDIA’s installer scripts. You’d typically do this in a run step. Caution: This can be brittle due to OS updates.

      jobs:
        gpu_job:
          runs-on: [self-hosted, gpu]
          steps:
            - name: Install NVIDIA Drivers
              run: |
                # Example: Download and run NVIDIA installer
                # This is a simplified example; actual commands depend on OS and driver version
                curl -O https://us.download.nvidia.com/tesla/525.60.13/NVIDIA-Linux-x86_64-525.60.13.run
                sh NVIDIA-Linux-x86_64-525.60.13.run --silent --dkms
            - name: Install CUDA Toolkit
              run: |
                # Download and install CUDA toolkit (e.g., from NVIDIA's repo)
                # ... commands to add repo and install cuda-toolkit-11-8
            - name: Verify GPU access
              run: nvidia-smi
      

      This approach requires careful management of the runner’s environment.

  2. Install CUDA Toolkit and cuDNN: Once drivers are in place, you need the CUDA toolkit and cuDNN library for deep learning frameworks.

    • Diagnosis: Your deep learning framework (TensorFlow, PyTorch) will likely throw errors like "Could not load dynamic library 'libcudart.so.11.0'" or "cuDNN not found."

    • Fix (within a Docker container): Use a CUDA-enabled base image like nvidia/cuda:11.8.0-cudnn8-runtime-ubuntu22.04. These images come with drivers (if the host has them and nvidia-container-runtime is set up) and the necessary CUDA/cuDNN libraries.

    • Fix (on a bare runner): Download and install the specific CUDA toolkit and cuDNN versions required by your application. Ensure the PATH and LD_LIBRARY_PATH environment variables are set correctly to point to the CUDA binaries and libraries.

      jobs:
        gpu_job:
          runs-on: [self-hosted, gpu]
          steps:
            - name: Checkout code
              uses: actions/checkout@v4
      
            - name: Install CUDA Toolkit 11.8
              run: |
                # Commands to download and install CUDA from NVIDIA's repo
                # e.g., wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.0-1_all.deb
                # sudo dpkg -i cuda-keyring_1.0-1_all.deb
                # sudo apt-get update
                # sudo apt-get -y install cuda-toolkit-11-8
                # export PATH=/usr/local/cuda-11.8/bin:$PATH
                # export LD_LIBRARY_PATH=/usr/local/cuda-11.8/lib64:$LD_LIBRARY_PATH
            - name: Install cuDNN 8.6
              run: |
                # Download cuDNN tarball from NVIDIA, extract, and copy files
                # to CUDA installation directories.
                # ... commands to extract and copy ...
            - name: Verify CUDA/cuDNN
              run: |
                nvcc --version
                # Check for cuDNN by importing in Python (if installed)
                # python -c "import torch; print(torch.backends.cudnn.version())"
      

Runner Configuration (Self-Hosted)

If you’re using self-hosted runners, the runner software itself needs to be aware of the GPU.

  1. GPU Label: When registering the self-hosted runner, you must assign it the gpu label.

    • Diagnosis: Jobs targeting runs-on: [self-hosted, gpu] will never be picked up.
    • Fix: When starting the runner, use the --labels flag:
      ./run.sh --url https://github.com/YOUR_ORG/YOUR_REPO --token YOUR_TOKEN --labels "self-hosted,gpu"
      
      Ensure that the machine running the self-hosted runner actually has a GPU and that the system can see it (e.g., nvidia-smi works on the host).
  2. Docker with --gpus all: If your workflow uses Docker, the runner process itself needs to be able to pass the --gpus all flag to Docker commands.

    • Diagnosis: Containers started by the Actions runner cannot see or use the GPUs.
    • Fix: Ensure the Docker daemon on the runner machine is configured to allow GPU access, or that the docker run commands within your workflow explicitly include --gpus all. This is often handled by the container configuration in the workflow YAML, as shown above, but if you’re running docker run commands directly in a run step, you’ll need to add it there.

Caching Dependencies

GPU-dependent libraries and models can be large. Caching them significantly speeds up subsequent runs.

  • Diagnosis: Jobs take a very long time to set up, re-downloading drivers, CUDA, cuDNN, or large model files on every run.

  • Fix: Use the actions/cache action. Cache the directories where drivers, CUDA, cuDNN, or your model artifacts are installed or downloaded.

    jobs:
      gpu_job:
        runs-on: [self-hosted, gpu]
        steps:
          - name: Checkout code
            uses: actions/checkout@v4
    
          - name: Set up Python
            uses: actions/setup-python@v5
            with:
              python-version: '3.10'
    
          - name: Cache CUDA/cuDNN
            uses: actions/cache@v3
            id: cuda-cache
            with:
              path: |
                /usr/local/cuda-11.8 # Example path
                /usr/local/cudnn-8.6 # Example path
    
              key: ${{ runner.os }}-cuda-${{ hashFiles('**/cuda-keyring_1.0-1_all.deb') }} # Adjust key based on how you install
    
    
          - name: Install CUDA/cuDNN (if not cached)
            if: steps.cuda-cache.outputs.cache-hit != 'true'
            run: |
              # ... commands to install CUDA/cuDNN ...
    
          - name: Cache Python dependencies
            uses: actions/cache@v3
            id: python-cache
            with:
              path: |
                ~/.cache/pip
                venv
    
              key: ${{ runner.os }}-pip-${{ hashFiles('**/requirements.txt') }}
    
    
          - name: Install Python dependencies (if not cached)
            if: steps.python-cache.outputs.cache-hit != 'true'
            run: |
              python -m venv venv
              source venv/bin/activate
              pip install -r requirements.txt
    
          - name: Run GPU workload
            run: |
              # Your GPU-accelerated commands here
              source venv/bin/activate
              python your_gpu_script.py
    

    The key for caching needs to be carefully crafted to ensure you get cache hits when appropriate but invalidation when the underlying installation commands change.

The next hurdle you’ll likely face is optimizing memory usage on the GPU to avoid CUDA out of memory errors.

Want structured learning?

Take the full Github-actions course →