Your Kubernetes pods are reporting Readiness probe failed errors, meaning the kubelet on the node decided your pod wasn’t ready to receive traffic and stopped directing requests to it. This isn’t a crash; the pod is still running, but Kubernetes thinks it’s unhealthy and won’t send it new work.
Here’s why your readiness probes are failing:
1. Application Not Listening on the Correct Port
The most common culprit is the application inside the pod not actually listening on the port specified in your readinessProbe definition. kubelet tries to connect to this port to check health.
- Diagnosis: Exec into the pod and use
netstat -tulnporss -tulnpto see which ports your application is listening on.kubectl exec -it <your-pod-name> -- netstat -tulnp - Fix: Update your
readinessProbein the Deployment/StatefulSet YAML to match the actual listening port. For example, if your app listens on8080but the probe is set to80:readinessProbe: httpGet: path: /healthz port: 8080 # Corrected port initialDelaySeconds: 5 periodSeconds: 10 - Why it works:
kubeletcan now successfully establish a TCP connection to the port your application is bound to.
2. Application Not Responding to Health Check Endpoint
Even if listening on the right port, your application might not be serving a valid HTTP response (or any response) on the specified path.
- Diagnosis: Use
curlfrom within the pod to hit the health check endpoint directly.
For example:kubectl exec -it <your-pod-name> -- curl http://localhost:<probe-port>/<probe-path>
Check for non-2xx status codes or no output at all.kubectl exec -it <your-pod-name> -- curl http://localhost:8080/healthz - Fix: Ensure your application’s health check endpoint returns an HTTP status code in the
2xxrange (e.g.,200 OK) when it’s healthy. If it returns5xx,4xx, or no response, the probe will fail.readinessProbe: httpGet: path: /status # Ensure this path is correctly implemented port: 8080 # ... other settings - Why it works:
kubeletreceives a successful HTTP status code, indicating the application is ready.
3. Application is Slow to Start Up
Your application takes longer to become ready than the initialDelaySeconds configured in the probe. kubelet starts checking immediately after the container is running, but the app might still be initializing databases, loading caches, or performing other startup tasks.
- Diagnosis: Observe the pod’s startup logs using
kubectl logs <your-pod-name>. You’ll see the container starts, but the readiness probe fails repeatedly until much later. - Fix: Increase
initialDelaySecondsto give your application ample time to initialize.readinessProbe: httpGet: path: /healthz port: 8080 initialDelaySeconds: 60 # Increased from 5 to 60 seconds periodSeconds: 10 - Why it works:
kubeletwaits longer before its first probe attempt, ensuring the application has completed its essential startup routines.
4. Network Policy Blocking Probe Traffic
If you have NetworkPolicies in place, they might be preventing kubelet from reaching the pod’s health check port. kubelet runs on the node, and its traffic might be subject to policies.
- Diagnosis: Check your NetworkPolicy definitions for any rules that might restrict ingress to the pod on the probe port from the
kube-systemnamespace or the node’s IP. - Fix: Add a NetworkPolicy rule that explicitly allows ingress traffic from the
kube-controller-manageror thekubelet’s IP range on the probe port. Often, allowing ingress fromkube-systemnamespace is sufficient.apiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: name: allow-health-checks namespace: default # Your pod's namespace spec: podSelector: matchLabels: app: your-app # Label matching your pod policyTypes: - Ingress ingress: - from: - namespaceSelector: matchLabels: kubernetes.io/metadata.name: kube-system # Allow from kube-system ports: - protocol: TCP port: 8080 # Your probe port - Why it works: This policy ensures that the
kubelet(which is typically managed by components inkube-system) can reach your pod on its readiness probe port.
5. Resource Constraints (CPU/Memory Throttling)
The pod is starved for resources, preventing the application from responding to the probe within the configured timeoutSeconds.
- Diagnosis: Check pod resource usage with
kubectl top pod <your-pod-name>and look for high CPU or memory consumption. Examine pod events (kubectl get events --field-selector involvedObject.name=<your-pod-name>) for OOMKilled or throttling messages. - Fix: Increase the CPU and memory
requestsandlimitsfor your container.resources: requests: memory: "256Mi" cpu: "200m" limits: memory: "512Mi" cpu: "500m" - Why it works: Adequate resources allow the application process to execute its health check logic and respond to
kubelet’s requests promptly.
6. Incorrect timeoutSeconds or periodSeconds
The probe’s timeoutSeconds is too short for your application to respond, or periodSeconds is too long, causing multiple failures to accumulate before kubelet marks it as not ready.
- Diagnosis: Observe the timing of probe failures. If your application sometimes responds but it takes longer than the timeout, this is the issue.
- Fix: Adjust
timeoutSecondsandperiodSecondsto be more forgiving. A common pattern istimeoutSecondsslightly longer than your expected response time, andperiodSecondstwice that.readinessProbe: httpGet: path: /healthz port: 8080 initialDelaySeconds: 5 periodSeconds: 15 # Increased from 10 timeoutSeconds: 5 # Increased from 1 - Why it works: A longer timeout gives the application more time to process the request and send a response, while adjusting the period ensures probes are sent at a reasonable interval without overwhelming the application or missing transient issues.
7. Liveness Probe is Incorrectly Configured (and overriding readiness)
While less common, a misconfigured liveness probe (especially if it’s failing more aggressively) can sometimes lead to pods being restarted, which then interrupts readiness checks. However, the primary mechanism is kubelet acting on the readiness probe directly. If your readiness probe is failing, the pod will be taken out of service rotation.
- Diagnosis: Review both your
livenessProbeandreadinessProbedefinitions carefully. - Fix: Ensure both probes are correctly configured, have appropriate
initialDelaySeconds,periodSeconds,timeoutSeconds, andfailureThreshold. - Why it works: Correctly configured probes accurately reflect the application’s state without causing unnecessary restarts or misinterpretations of readiness.
If you resolve these, you’ll likely next encounter CrashLoopBackOff if the underlying application issue is more severe than just slow startup.