Resource quotas and limits in Kubernetes, especially within a multi-team GKE environment, are your primary tool for preventing resource contention and ensuring fair usage. Without them, a single team’s runaway application can starve others, leading to cascading failures and an unstable cluster.
Let’s see this in action. Imagine we have a cluster with two teams: frontend and backend.
Here’s a ResourceQuota object that sets limits for the frontend namespace:
apiVersion: v1
kind: ResourceQuota
metadata:
name: frontend-quota
namespace: frontend
spec:
hard:
requests.cpu: "10"
requests.memory: 20Gi
limits.cpu: "20"
limits.memory: 40Gi
pods: "50"
services: "20"
persistentvolumeclaims: "10"
And for the backend namespace:
apiVersion: v1
kind: ResourceQuota
metadata:
name: backend-quota
namespace: backend
spec:
hard:
requests.cpu: "20"
requests.memory: 40Gi
limits.cpu: "40"
limits.memory: 80Gi
pods: "100"
services: "30"
persistentvolumeclaims: "20"
When a pod is created in the frontend namespace, Kubernetes checks if adding that pod’s requests (CPU and memory) would exceed the requests.cpu (10 CPU cores) or requests.memory (20 GiB) quota. If it would, the pod creation is rejected. Similarly, limits enforce the maximum resources a pod can ever consume. If a pod tries to use more CPU or memory than its limits, it will be throttled (CPU) or terminated (memory OOMKilled).
The core problem these solve is resource starvation. In a shared cluster, if one team deploys an application that doesn’t specify resource requests and limits, it might consume a large chunk of a node’s capacity. This leaves less for other pods, potentially causing them to be evicted or become unresponsive. Quotas prevent this by capping the total requested and limited resources across all pods within a namespace.
Here’s a breakdown of the key fields within a ResourceQuota:
requests.cpu: The sum of CPUrequestsfor all containers in pods within the namespace cannot exceed this value. This is crucial for scheduling; Kubernetes uses requests to decide which node a pod can run on.requests.memory: The sum of memoryrequestsfor all containers in pods within the namespace cannot exceed this value. Similar to CPU, this informs the scheduler.limits.cpu: The sum of CPUlimitsfor all containers in pods within the namespace cannot exceed this value. This is a hard cap on CPU usage.limits.memory: The sum of memorylimitsfor all containers in pods within the namespace cannot exceed this value. If a container exceeds its memory limit, it’s killed.pods: The maximum number of pods that can be created in the namespace.services: The maximum number of services that can be created.persistentvolumeclaims: The maximum number of persistent volume claims that can be created.
You can also set quotas for specific storage classes (requests.storageclass.storage.k8s.io/<storage-class-name>) and for ephemeral storage (requests.ephemeral-storage, limits.ephemeral-storage).
The most surprising aspect of resource quotas is how they interact with pod creation. A pod creation is rejected if adding its requests would violate the quota even if the pod itself doesn’t set limits. If a pod is created with no requests specified, it’s treated as having a request of 0 for CPU and memory, which is generally undesirable as it can lead to unexpected scheduling behavior. Always specify requests.
The next logical step after resource quotas is to explore LimitRanges. LimitRanges are applied per pod or container within a namespace, providing default values and enforcing minimums/maximums on individual resource requests and limits, ensuring that even if a team forgets to set them, the pods will still have sane defaults and won’t exceed certain bounds.