A higher-priority Pod can evict a lower-priority Pod from a Kubernetes Node to make room for itself.
This is not a theoretical concept; it’s how Kubernetes ensures critical workloads get the resources they need, even when the cluster is full. Imagine you have a set of microservices running, some more critical than others. If a new, high-priority service needs to start and there’s no available capacity, Kubernetes won’t just let it fail. It will look for existing Pods that have a lower priority and, if they are consuming resources on a Node that the new Pod could use, it will terminate those lower-priority Pods. This process is called "preemption."
Let’s see this in action.
First, we need to define priority classes.
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
name: high-priority
value: 1000
globalDefault: false
description: "This priority class should be used for critical system components."
---
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
name: low-priority
value: 100
globalDefault: false
description: "This priority class is for non-critical batch jobs."
Now, let’s create two Pods, one with high priority and one with low priority. We’ll also set resource requests so they contend for resources.
apiVersion: v1
kind: Pod
metadata:
name: low-priority-pod
labels:
app: low-priority
spec:
containers:
- name: nginx
image: nginx:alpine
resources:
requests:
cpu: "500m"
memory: "256Mi"
priorityClassName: low-priority
---
apiVersion: v1
kind: Pod
metadata:
name: high-priority-pod
labels:
app: high-priority
spec:
containers:
- name: stress
image: alpine/stress
resources:
requests:
cpu: "600m"
memory: "300Mi"
args:
- "--cpu"
- "1"
- "--vm"
- "1"
- "--vm-bytes"
- "200m"
- "--timeout"
- "3600s"
priorityClassName: high-priority
If we deploy low-priority-pod first, it will likely get scheduled onto a Node.
kubectl apply -f low-priority-pod.yaml
Now, imagine our cluster is already at capacity, and we try to deploy high-priority-pod.
kubectl apply -f high-priority-pod.yaml
If high-priority-pod requires more resources than are currently available on any single Node, and there’s a low-priority-pod running on a Node that could accommodate high-priority-pod (after low-priority-pod is removed), Kubernetes’ scheduler will preempt low-priority-pod. The scheduler identifies that high-priority-pod cannot be scheduled due to insufficient resources. It then looks for Pods that are consuming resources on Nodes where high-priority-pod could be scheduled if those resources were freed. Among the candidates for eviction, it prioritizes Pods with lower priorityClassName values. Once low-priority-pod is terminated, the resources it was using become available, and high-priority-pod can then be scheduled and run.
The core problem preemption solves is ensuring that critical components, like core Kubernetes services (e.g., kube-scheduler, kube-controller-manager, etcd) or essential application workloads, remain available and can start even under heavy resource pressure. Without preemption, a flood of low-priority, resource-intensive workloads could starve critical system components, leading to cluster instability or complete failure.
The mechanics involve the Kubernetes scheduler. When a Pod is created, the scheduler attempts to find a Node for it. If no Node has sufficient available resources, the scheduler enters a "preemption" phase. It identifies Pods that are candidates for eviction based on their priorityClassName and the resource requirements of the pending Pod. The scheduler will select a Pod to evict from a Node that has enough freeable resources to satisfy the pending Pod’s requests. The evicted Pod is then terminated, and the scheduler retries scheduling the pending Pod.
The value field in PriorityClass is an integer. Higher numbers mean higher priority. If two Pods have the same priority, preemption rules don’t apply between them. globalDefault: true can be set on a PriorityClass to assign that priority to any Pod that doesn’t explicitly specify one, which is useful for ensuring a baseline level of priority for all workloads.
What most people don’t realize is that preemption isn’t a single-shot event. If a high-priority Pod is created and the scheduler can’t find a Node for it, it might evict a low-priority Pod. However, if that evicted Pod was using resources that still aren’t enough for the high-priority Pod, the scheduler might need to find another Node and evict another low-priority Pod to satisfy the original high-priority Pod’s needs. This can cascade, and it’s why carefully managing your PriorityClass values and resource requests is crucial to avoid unintended disruption.
The next thing you’ll likely encounter is understanding how to configure PodDisruptionBudgets to protect your high-priority Pods from being preempted by even higher-priority Pods.