Run GPU Workloads in Docker with NVIDIA Container Toolkit (2026)

The NVIDIA Container Toolkit doesn’t actually give Docker direct access to your GPU hardware; it works by exposing the NVIDIA driver and CUDA libraries inside the container, making it look like the GPU is available.

Let’s see it in action. Imagine you’ve got a simple Python script that checks for CUDA availability and prints the GPU name.

import torch

if torch.cuda.is_available():
    print(f"CUDA is available! Using GPU: {torch.cuda.get_device_name(0)}")
else:
    print("CUDA not available. Running on CPU.")

Now, let’s build a Docker image that can run this. We’ll start with a base image that has Python and PyTorch installed, and then layer on the NVIDIA Container Toolkit’s setup.

First, your Dockerfile:

# Use a base image with Python and PyTorch pre-installed
# This one is for CUDA 11.8, adjust if you have a different CUDA version
FROM pytorch/pytorch:2.1.0-cuda11.8-cudnn8-runtime

# Install additional tools if needed, e.g., pip for other packages
RUN apt-get update && apt-get install -y --no-install-recommends \
    git \
    wget \
    && rm -rf /var/lib/apt/lists/*

# Copy your Python script into the container
COPY check_gpu.py /app/check_gpu.py

# Set the working directory
WORKDIR /app

# Command to run the script
CMD ["python", "check_gpu.py"]

You’ll build this like any other Docker image:

docker build -t gpu-check-app .

Now, the crucial part is how you run this container. You need to tell Docker to use the NVIDIA runtime.

docker run --gpus all gpu-check-app

If everything is set up correctly, and you have an NVIDIA GPU with compatible drivers installed on your host machine, the output will look something like this:

CUDA is available! Using GPU: NVIDIA GeForce RTX 3090

This docker run --gpus all command is the magic. It instructs the Docker daemon to use the NVIDIA Container Toolkit’s runtime. The toolkit, in turn, mounts the necessary NVIDIA driver files, CUDA libraries, and device nodes from your host into the container. It’s not a full passthrough of the hardware, but rather a carefully curated set of components that allow CUDA-aware applications within the container to interact with the host’s GPU.

The core problem this solves is dependency management and reproducibility for GPU-accelerated workloads. Instead of trying to install specific CUDA toolkit versions, cuDNN libraries, and NVIDIA drivers on every single machine where you want to run your AI models, you package all those dependencies into a Docker image. This ensures that your application runs identically, regardless of the host system’s configuration, as long as the host has the NVIDIA driver and the NVIDIA Container Toolkit installed.

Internally, the NVIDIA Container Toolkit works by providing a custom Docker runtime. When you use docker run --gpus all, Docker consults its configuration (/etc/docker/daemon.json) to find the specified runtime. If nvidia is configured as the runtime, Docker delegates the container creation to the NVIDIA runtime. This runtime then performs the necessary steps: it identifies the available GPUs on the host, selects the appropriate driver files and CUDA libraries, and mounts them into the container’s filesystem at predefined locations (like /usr/local/nvidia/lib, /usr/local/cuda, and /dev/nvidia*). It also sets environment variables like NVIDIA_VISIBLE_DEVICES to tell the application which GPU(s) it’s allowed to use.

The exact levers you control are primarily through the docker run command and your Dockerfile.

In your Dockerfile:

Base Image: Choosing a base image with the correct CUDA and cuDNN version is paramount. Images like pytorch/pytorch:X.Y.Z-cudaA.B-cudnnC-runtime are specifically built for this.
Package Installation: Any additional libraries your application needs (e.g., nvidia-ml-py3 for nvidia-smi inside the container) would be installed here using pip or apt-get.

In your docker run command:

--gpus all: This is the simplest and most common way to expose all available GPUs.
--gpus '"device=0,2"': You can specify which GPUs to expose by their index. This is useful if you have multiple GPUs and want to isolate workloads.
--gpus capability=compute,utility: You can also expose GPUs based on their capabilities.
Environment Variables: You can manually set NVIDIA_VISIBLE_DEVICES if you’re not using the --gpus flag, though it’s less common.

The one thing most people don’t realize is that the NVIDIA driver must be installed on the host machine, and its version needs to be compatible with the CUDA toolkit version you’re using inside the container. The toolkit doesn’t bundle a driver; it exposes the host’s driver to the container. If your host driver is too old for the CUDA version in your container, you’ll get errors like "CUDA driver version is insufficient for CUDA runtime version."

The next conceptual hurdle you’ll hit is managing GPU memory and performance tuning within containers, which often involves understanding nvidia-smi output and how it maps to your containerized processes.