Resource quotas and limits in Kubernetes, especially within a multi-team GKE environment, are your primary tool for preventing resource contention and ensuring fair usage. Without them, a single team’s runaway application can starve others, leading to cascading failures and an unstable cluster.

Let’s see this in action. Imagine we have a cluster with two teams: frontend and backend.

Here’s a ResourceQuota object that sets limits for the frontend namespace:

apiVersion: v1
kind: ResourceQuota
metadata:
  name: frontend-quota
  namespace: frontend
spec:
  hard:
    requests.cpu: "10"
    requests.memory: 20Gi
    limits.cpu: "20"
    limits.memory: 40Gi
    pods: "50"
    services: "20"
    persistentvolumeclaims: "10"

And for the backend namespace:

apiVersion: v1
kind: ResourceQuota
metadata:
  name: backend-quota
  namespace: backend
spec:
  hard:
    requests.cpu: "20"
    requests.memory: 40Gi
    limits.cpu: "40"
    limits.memory: 80Gi
    pods: "100"
    services: "30"
    persistentvolumeclaims: "20"

When a pod is created in the frontend namespace, Kubernetes checks if adding that pod’s requests (CPU and memory) would exceed the requests.cpu (10 CPU cores) or requests.memory (20 GiB) quota. If it would, the pod creation is rejected. Similarly, limits enforce the maximum resources a pod can ever consume. If a pod tries to use more CPU or memory than its limits, it will be throttled (CPU) or terminated (memory OOMKilled).

The core problem these solve is resource starvation. In a shared cluster, if one team deploys an application that doesn’t specify resource requests and limits, it might consume a large chunk of a node’s capacity. This leaves less for other pods, potentially causing them to be evicted or become unresponsive. Quotas prevent this by capping the total requested and limited resources across all pods within a namespace.

Here’s a breakdown of the key fields within a ResourceQuota:

  • requests.cpu: The sum of CPU requests for all containers in pods within the namespace cannot exceed this value. This is crucial for scheduling; Kubernetes uses requests to decide which node a pod can run on.
  • requests.memory: The sum of memory requests for all containers in pods within the namespace cannot exceed this value. Similar to CPU, this informs the scheduler.
  • limits.cpu: The sum of CPU limits for all containers in pods within the namespace cannot exceed this value. This is a hard cap on CPU usage.
  • limits.memory: The sum of memory limits for all containers in pods within the namespace cannot exceed this value. If a container exceeds its memory limit, it’s killed.
  • pods: The maximum number of pods that can be created in the namespace.
  • services: The maximum number of services that can be created.
  • persistentvolumeclaims: The maximum number of persistent volume claims that can be created.

You can also set quotas for specific storage classes (requests.storageclass.storage.k8s.io/<storage-class-name>) and for ephemeral storage (requests.ephemeral-storage, limits.ephemeral-storage).

The most surprising aspect of resource quotas is how they interact with pod creation. A pod creation is rejected if adding its requests would violate the quota even if the pod itself doesn’t set limits. If a pod is created with no requests specified, it’s treated as having a request of 0 for CPU and memory, which is generally undesirable as it can lead to unexpected scheduling behavior. Always specify requests.

The next logical step after resource quotas is to explore LimitRanges. LimitRanges are applied per pod or container within a namespace, providing default values and enforcing minimums/maximums on individual resource requests and limits, ensuring that even if a team forgets to set them, the pods will still have sane defaults and won’t exceed certain bounds.

Want structured learning?

Take the full Gke course →