K8s Resource Limits vs. Requests: The Real Difference

Kubernetes resource limits are less about preventing abuse and more about ensuring predictable performance for your applications.

Let’s see this in action. Imagine a simple Deployment for a web server:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-app
spec:
  replicas: 2
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx:latest
        ports:
        - containerPort: 80
        resources:
          requests:
            cpu: "100m"
            memory: "128Mi"
          limits:
            cpu: "200m"
            memory: "256Mi"

Here, we’re telling Kubernetes that each nginx container wants 100 millicores (0.1 CPU core) and 128 MiB of memory to start with. But critically, it’s limited to a maximum of 200 millicores and 256 MiB.

The problem this solves isn’t malicious users hogging resources. It’s about the inherent variability of applications. A web server might get a sudden surge of traffic, causing it to momentarily need more CPU. Without limits, one pod could consume all available CPU on a node, starving other pods and potentially crashing the node itself. Similarly, a memory leak, if unchecked, could consume all available memory, leading to OOMKilled (Out Of Memory Killed) events and pod restarts.

Resource requests are what the Kubernetes scheduler uses to decide where to place your pod. It looks for a node that has enough available CPU and memory to satisfy the requests. This ensures that, even under normal load, your pods have a baseline of resources guaranteed to them.

Resource limits, on the other hand, are enforced by the container runtime (like containerd or Docker) on the node. For CPU, limits are implemented using Linux Control Groups (cgroups). A CPU limit of 200m means the container’s processes can only use up to 20% of a single CPU core’s time. If the container tries to exceed this, its CPU usage will be throttled, making it slower. It won’t be killed, just less responsive.

Memory limits are also enforced via cgroups. If a container exceeds its memory limit, the kernel’s OOM killer will be invoked for that specific container, terminating its processes. This is a hard stop, preventing runaway memory consumption from impacting the entire node.

The interplay between requests and limits is key. If requests are set too low, pods might not be scheduled onto nodes with sufficient capacity, leading to scheduling failures or resource contention. If limits are set too low, legitimate spikes in application demand will be throttled or cause OOMKills, leading to poor performance and instability.

A common misconception is that setting limits is only for "bad" applications. In reality, even well-behaved applications can experience temporary resource spikes. For instance, during a cache warm-up, a database query, or a sudden burst of user activity, an application might temporarily require more CPU or memory. Resource limits, when set appropriately, allow the application to burst up to its limit, providing good performance during these spikes, but prevent it from consuming excessive resources and impacting other workloads.

When setting these values, it’s crucial to profile your application under realistic load. Tools like kubectl top pods and Prometheus can give you insights into actual resource usage. A good starting point is to set requests to your application’s typical baseline usage and limits to a value that accommodates expected peak usage without causing instability. For CPU, requests are often set to the average usage, and limits to 1.5x or 2x that average. For memory, requests are usually set to the stable baseline, and limits to the maximum observed usage during testing, plus a small buffer.

The next concept to grapple with is how these limits interact with Quality of Service (QoS) classes: Guaranteed, Burstable, and BestEffort.