GKE Pods can be scheduled to specific nodes, not just randomly, using a combination of node taints and pod affinity rules.

Let’s watch this in action. Imagine we have a fleet of specialized GPU nodes that we only want a specific set of "heavy lifting" pods to run on.

First, we’ll "taint" these nodes. This is like putting a "reserved for special jobs" sign on them. Any pod that doesn’t explicitly say it can handle this "taint" will be repelled.

kubectl taint nodes <gpu-node-name> gpu=true:NoSchedule

Here, gpu=true is the key-value pair describing the taint, and NoSchedule means that pods without a matching toleration won’t be scheduled here.

Now, we have our "heavy lifting" pods. We need to tell them they are allowed to land on these tainted GPU nodes. This is done with a "toleration" in the pod’s spec:

apiVersion: v1
kind: Pod
metadata:
  name: gpu-worker
spec:
  containers:
  - name: worker
    image: gcr.io/my-project/gpu-worker:latest
  tolerations:
  - key: "gpu"
    operator: "Equal"
    value: "true"
    effect: "NoSchedule"

This pod spec says, "I can tolerate nodes that have the gpu=true:NoSchedule taint." Without this toleration, our gpu-worker pod would never land on the tainted GPU nodes, even if it was the only pod requesting a GPU.

But what if we have multiple GPU nodes, and we want our gpu-worker pods to prefer them, or only run on them? This is where affinity comes in. Affinity lets us express preferences or hard requirements about where pods can run based on node labels.

Let’s label our GPU nodes:

kubectl label nodes <gpu-node-name-1> gpu-type=high-performance
kubectl label nodes <gpu-node-2> gpu-type=high-performance

Now, we can add node affinity to our pod spec. This is a requirement that the pod must match the node label.

apiVersion: v1
kind: Pod
metadata:
  name: gpu-worker
spec:
  containers:
  - name: worker
    image: gcr.io/my-project/gpu-worker:latest
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: gpu-type
            operator: In
            values:
            - high-performance
  tolerations:
  - key: "gpu"
    operator: "Equal"
    value: "true"
    effect: "NoSchedule"

The requiredDuringSchedulingIgnoredDuringExecution part means that the pod must be scheduled onto a node that satisfies these conditions. If the node later loses the label, the pod keeps running (hence IgnoredDuringExecution). nodeSelectorTerms is a list of conditions, and matchExpressions specifies the actual label matching. Here, we’re saying the node’s gpu-type label must be high-performance.

This combination of taints, tolerations, and node affinity gives us fine-grained control. Taints and tolerations are a "negative" constraint (repelling pods unless they tolerate), while node affinity is a "positive" constraint (requiring pods to match).

We can also use "preferred" affinity. This is a soft preference, not a hard requirement. The scheduler will try its best to satisfy it but will schedule the pod elsewhere if it can’t.

apiVersion: v1
kind: Pod
metadata:
  name: gpu-worker
spec:
  containers:
  - name: worker
    image: gcr.io/my-project/gpu-worker:latest
  affinity:
    nodeAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 1
        preference:
          matchExpressions:
          - key: gpu-type
            operator: In
            values:
            - high-performance
  tolerations:
  - key: "gpu"
    operator: "Equal"
    value: "true"
    effect: "NoSchedule"

Here, weight: 1 means this preference has a moderate priority. Higher weights are more preferred. This is useful when you have multiple node types and want to steer pods towards a specific one but allow them to run on others if necessary.

What most people miss is that requiredDuringSchedulingIgnoredDuringExecution is a hard requirement for scheduling, but the node can change its labels after the pod is scheduled, and the pod will simply continue running on that node. The IgnoredDuringExecution part is crucial here.

The next step is to explore pod-to-pod affinity, where you schedule pods based on the labels of other pods already running on a node.

Want structured learning?

Take the full Gke course →