Kubernetes Pod Priority is the system’s way of saying "some pods are more important than others, and if there aren’t enough resources to go around, the important ones get to play first."

Let’s see it in action. Imagine you have two namespaces, critical and batch.

apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: critical-workload
value: 1000000
globalDefault: false
description: "This priority class should be used for critical workloads."
---
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: batch-workload
value: 100
globalDefault: false
description: "This priority class should be used for batch workloads."

Now, let’s create some pods in each namespace, assigning them these PriorityClass values.

Critical Pod:

apiVersion: v1
kind: Pod
metadata:
  name: critical-app-1
  namespace: critical
spec:
  containers:
  - name: main
    image: nginx:latest
    resources:
      requests:
        memory: "1Gi"
        cpu: "1"
      limits:
        memory: "2Gi"
        cpu: "2"
  priorityClassName: critical-workload

Batch Pod:

apiVersion: v1
kind: Pod
metadata:
  name: batch-job-1
  namespace: batch
spec:
  containers:
  - name: main
    image: ubuntu:latest
    command: ["sleep", "3600"]
    resources:
      requests:
        memory: "1Gi"
        cpu: "1"
      limits:
        memory: "2Gi"
        cpu: "2"
  priorityClassName: batch-workload

If you have a cluster with just enough resources for one of these pods, and you try to schedule both at the exact same instant, the critical-app-1 will get scheduled, and batch-job-1 will be stuck in a Pending state. The scheduler (kube-scheduler) looks at the value associated with the PriorityClass. Higher values mean higher priority.

The problem Pod Priority solves is resource contention. In a dynamic environment like Kubernetes, nodes can become full, or a node might fail. When this happens, pods that are essential for your business operations (like your main API or a critical database) need to be guaranteed a spot, even if it means evicting less important workloads. This is achieved through preemption. If a higher-priority pod cannot be scheduled because there are no available nodes with sufficient resources, the scheduler can preempt (evict) one or more lower-priority pods from a node to make room.

Here’s how it works under the hood:

  1. PriorityClass Definition: You define PriorityClass objects. These are cluster-scoped resources. Each PriorityClass has a value (an integer) and an optional preemptionPolicy. The value determines the relative priority; higher numbers are more important. globalDefault: true can set a default priority for pods that don’t specify one.

  2. Pod Assignment: When you create a pod, you can assign it a priorityClassName. If you don’t, it gets a default priority (either from globalDefault: true in a PriorityClass or a system-defined low value, typically 0).

  3. Scheduling Decision: When kube-scheduler needs to place a pod, it considers:

    • Resource Availability: Does the node have enough CPU, memory, etc.?
    • Pod Priority: How important is this pod compared to others?
    • Preemption: If there aren’t enough resources, can this pod preempt lower-priority pods?
  4. Preemption Logic: If a high-priority pod is pending and cannot be scheduled, kube-scheduler will look for a node where preempting one or more existing pods would free up enough resources for the high-priority pod. It will select a node and evict the lowest-priority pods on that node until the high-priority pod can be scheduled. The preemptionPolicy on the PriorityClass can influence this:

    • PreemptLowerPriority (default): Allows preemption.
    • DoNotPreempt: Prevents pods of this priority class from preempting other pods, and prevents them from being preempted.

The globalDefault: true setting on a PriorityClass is a powerful lever. If you set globalDefault: true on a PriorityClass with a low value (e.g., 0), and then don’t assign any priorityClassName to your pods, they will all get that low priority. This makes it easy to ensure that any explicitly prioritized workloads will preempt these "default" pods if necessary.

If you have a globalDefault PriorityClass and a pod that doesn’t specify a priorityClassName, it will be assigned the value of that globalDefault PriorityClass. If there are multiple globalDefault PriorityClasses, the scheduler will pick the one with the highest value.

The next concept you’ll likely encounter is managing resource requests and limits effectively to complement your priority strategy.

Want structured learning?

Take the full Kubernetes course →