GKE’s GPU node pools are a game-changer for machine learning, but their setup often feels more like wrestling with a black box than a controlled deployment.

Let’s see a GPU node pool in action. Imagine we’re spinning up a cluster for TensorFlow training.

gcloud container clusters create ml-cluster \
  --zone us-central1-a \
  --num-nodes 1 \
  --machine-type n1-standard-8 \
  --enable-autoscaling --min-nodes 1 --max-nodes 5 \
  --release-channel rapid \
  --cluster-version 1.27.5-gke.100

Now, we add a GPU-enabled node pool. Notice the --accelerator flag. This is where the magic happens.

gcloud container node-pools create gpu-pool \
  --cluster ml-cluster \
  --zone us-central1-a \
  --accelerator type=nvidia-tesla-t4,count=2,gpu-memory=16Gi \
  --num-nodes 1 \
  --machine-type n1-standard-8 \
  --enable-autoscaling --min-nodes 1 --max-nodes 3 \
  --release-channel rapid \
  --cluster-version 1.27.5-gke.100

This creates a new set of nodes, specifically provisioned with NVIDIA Tesla T4 GPUs. GKE automatically handles the installation of the necessary NVIDIA device drivers and the Kubernetes device plugin. When you deploy a pod requesting GPUs, Kubernetes will schedule it onto one of these GPU nodes.

The core problem GKE GPU node pools solve is simplifying the complex task of provisioning and managing hardware accelerators for demanding workloads like deep learning. Traditionally, this involved manual driver installation, CUDA toolkit management, and intricate Kubernetes scheduling configurations. GKE abstracts this away.

Internally, GKE leverages Google’s robust infrastructure. When you specify an accelerator type, GKE provisions VMs with the corresponding GPU hardware attached. For NVIDIA GPUs, it orchestrates the installation of the NVIDIA device plugin for Kubernetes. This plugin allows Kubernetes to discover and expose GPU resources to pods. The nvidia-container-runtime is also configured on the nodes, ensuring that containers can properly access the GPUs.

The key levers you control are:

  • --accelerator type: This is the specific GPU model you want (e.g., nvidia-tesla-t4, nvidia-a100, nvidia-l4). Choose based on your workload’s performance and cost requirements.
  • --accelerator count: The number of GPUs per node. For multi-GPU training, you might want more than one.
  • --accelerator gpu-memory: The amount of GPU memory per GPU. This is crucial for large models that require significant VRAM.
  • --machine-type: While you’re adding GPUs, the underlying CPU and RAM of the VM still matter for data loading and preprocessing. Ensure it’s balanced.
  • Autoscaling limits: Crucial for cost management. You don’t want to accidentally scale up to thousands of expensive GPU nodes. Set sensible min-nodes and max-nodes.

What most people miss is that GKE’s GPU node pools aren’t just about attaching hardware; they’re about enabling the Kubernetes scheduler to understand and allocate these specialized resources. The NVIDIA device plugin, installed automatically, registers GPU capacities with the Kubernetes API server. When a pod declares nvidia.com/gpu: 1 in its resources.limits, the scheduler sees these available GPUs and places the pod on a node that can satisfy the request. Without this plugin, the GPUs would be present on the VM but invisible to Kubernetes.

Once your GPU node pools are set up and your workloads are successfully requesting and utilizing GPUs, your next challenge will be optimizing GPU utilization and managing GPU memory effectively.

Want structured learning?

Take the full Gke course →