Kubernetes resource requests and limits are the knobs you turn to prevent your cluster from becoming a chaotic, resource-starved mess, but they’re often set with a "set it and forget it" mentality that leads to subtle performance degradation and unexpected evictions.

Let’s see this in action. Imagine a simple Deployment for a web application.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-web-app
spec:
  replicas: 3
  selector:
    matchLabels:
      app: my-web-app
  template:
    metadata:
      labels:
        app: my-web-app
    spec:
      containers:
      - name: web
        image: nginx:latest
        ports:
        - containerPort: 80
        resources:
          requests:
            cpu: "100m"
            memory: "128Mi"
          limits:
            cpu: "200m"
            memory: "256Mi"

In this example, we’re telling Kubernetes that each nginx pod needs at least 100 millicores (m) of CPU and 128 Mebibytes (Mi) of memory to start and run. It also says that each pod should never use more than 200m of CPU or 256Mi of memory.

The problem this solves is resource contention. Without requests, Kubernetes has no idea how much a pod needs. It might schedule a tiny, low-usage pod onto a node that’s already struggling, leading to poor performance for all pods on that node. With limits, you prevent a single runaway process from hogging all the node’s resources, starving other critical system processes or other applications.

Internally, Kubernetes uses these values for two primary functions:

  1. Scheduling: The scheduler uses requests to find a node with enough available CPU and memory to accommodate the pod. A node’s capacity is the sum of the requests of all pods running on it.
  2. Resource Management (QoS Classes): requests and limits determine a pod’s Quality of Service (QoS) class.
    • Guaranteed: requests == limits for both CPU and memory. These pods are the last to be evicted.
    • Burstable: requests < limits for at least one resource, or only one is set. These pods are evicted before Guaranteed pods.
    • BestEffort: No requests or limits set. These pods are the first to be evicted.

The exact levers you control are the cpu and memory fields within requests and limits. CPU is measured in cores (e.g., 1 for one full core) or millicores (100m for 1/10th of a core). Memory is measured in bytes, Kibibytes (Ki), Mebibytes (Mi), Gibibytes (Gi), etc.

The most surprising true thing about resource management in Kubernetes is that CPU limits are enforced via throttling, not by killing the process. If a container tries to use more CPU than its limit, Kubernetes doesn’t terminate it; it simply slows down its CPU time slice. This means a CPU-limited pod can still consume 100% of its allocated CPU, but it will do so by being throttled, leading to increased latency and reduced throughput. Memory, on the other hand, will result in the pod being OOMKilled (Out Of Memory killed) if it exceeds its limit.

The next concept you’ll likely grapple with is how to effectively monitor and tune these requests and limits in a dynamic environment, especially for applications with variable workloads.

Want structured learning?

Take the full Kubernetes course →