The kubelet on your nodes is failing to connect to the kube-dns or CoreDNS service because it’s not correctly configured to use the cluster’s DNS IP address. This is preventing pods from resolving internal Kubernetes service names and external hostnames.
Here are the common reasons this breaks and how to fix them:
1. Incorrect clusterDNS IP in kubelet configuration
The kubelet needs to know where the cluster’s DNS service is located. This is typically set via the --cluster-dns flag or within its configuration file. If this IP is wrong, kubelet won’t be able to find kube-dns/CoreDNS.
Diagnosis:
Check the kubelet configuration. On most systems using systemd, you can find this by running:
sudo systemctl cat kubelet
Look for a line like KUBELET_EXTRA_ARGS="--cluster-dns=10.96.0.10". The IP address 10.96.0.10 is the default for kube-dns in many cluster setups. You can also check the IP assigned to the kube-dns service itself:
kubectl get svc kube-dns -n kube-system -o jsonpath='{.spec.clusterIP}'
Compare this IP with the one in your kubelet configuration.
Fix:
If the IP is incorrect in /etc/kubernetes/kubelet.conf or passed via a systemd drop-in file (e.g., /etc/systemd/system/kubelet.service.d/10-kubeadm.conf), update it to match the actual clusterIP of your kube-dns or CoreDNS service.
For example, if kubectl get svc kube-dns -n kube-system -o jsonpath='{.spec.clusterIP}' returns 10.100.0.10, ensure your kubelet config has --cluster-dns=10.100.0.10.
After updating the configuration file, restart the kubelet service:
sudo systemctl restart kubelet
This works because kubelet injects this DNS IP into the resolv.conf file of every pod it manages, allowing them to correctly query the cluster DNS.
2. kube-dns or CoreDNS Pods Not Running or Crashing
The DNS service itself might be unhealthy. If the pods running kube-dns or CoreDNS aren’t running or are constantly restarting, DNS resolution will fail.
Diagnosis: Check the status of the DNS pods:
kubectl get pods -n kube-system -l k8s-app=kube-dns
or for CoreDNS:
kubectl get pods -n kube-system -l k8s-app=coredns
Look for any pods that are not in a Running state, or that have a high restart count. Check their logs for errors:
kubectl logs <dns-pod-name> -n kube-system
Fix:
If pods are in CrashLoopBackOff or Error state, examine their logs for specific errors. Common issues include:
- Resource Limits: The pods might be hitting CPU or memory limits. You can adjust the resource requests/limits in their Deployment manifest.
- Configuration Errors: Incorrect configuration in
ConfigMaps mounted by the DNS pods. - Network Issues: The pods might not be able to communicate with the Kubernetes API server or other necessary components.
To fix, you might need to edit the Deployment for kube-dns or CoreDNS (e.g., kubectl edit deployment coredns -n kube-system), adjust resource limits, or fix the ConfigMap (e.g., kubectl edit configmap coredns -n kube-system). After applying changes, you might need to delete the problematic pods to force a restart:
kubectl delete pod <dns-pod-name> -n kube-system
This works because restarting healthy DNS pods ensures that the DNS resolution service is actively listening and ready to respond to queries from other pods.
3. Network Policies Blocking DNS Traffic
If you have Network Policies defined in your cluster, they might be inadvertently blocking traffic to the kube-dns or CoreDNS pods. DNS typically uses UDP port 53.
Diagnosis:
Check if any Network Policies are applied to the kube-system namespace or the kube-dns/coredns pods.
kubectl get networkpolicy -n kube-system
If policies exist, examine their rules to see if they allow ingress traffic on UDP port 53 from all pods in the cluster (or at least from the nodes).
Fix:
Add or modify a Network Policy to explicitly allow ingress traffic on UDP port 53 to the kube-dns or coredns pods from all namespaces or specific namespaces that need DNS resolution.
Example:
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-dns
namespace: kube-system
spec:
podSelector:
matchLabels:
k8s-app: coredns # or k8s-app: kube-dns
policyTypes:
- Ingress
ingress:
- from:
- podSelector: {} # Allows from all pods in the namespace
ports:
- protocol: UDP
port: 53
- protocol: TCP
port: 53 # Some DNS queries might use TCP
Apply this policy:
kubectl apply -f <your-policy-file.yaml>
This works by ensuring that the firewall rules within the cluster explicitly permit DNS queries to reach the DNS service pods.
4. Pod’s resolv.conf is Incorrectly Configured
The resolv.conf file inside a pod is responsible for telling it how to perform DNS lookups. If this file is not correctly populated by kubelet or if it’s been manually overridden, DNS will fail.
Diagnosis:
Exec into a pod and inspect its resolv.conf:
kubectl exec -it <your-pod-name> -- cat /etc/resolv.conf
You should see lines like:
nameserver 10.96.0.10 # This should be your cluster DNS IP
search <namespace>.svc.cluster.local svc.cluster.local cluster.local
options ndots:5
If the nameserver IP is missing, incorrect, or if the search domains are wrong, this is the issue.
Fix:
This is usually a symptom of the kubelet configuration issue mentioned in point 1. Ensure kubelet is correctly configured with --cluster-dns.
If you’ve manually set dnsPolicy in your pod spec to None, you’ll need to manage resolv.conf yourself. For most use cases, dnsPolicy: ClusterFirst (the default) is desired, which tells kubelet to configure resolv.conf.
If a pod has dnsPolicy: None, you would need to manually configure its resolv.conf or change the policy back to ClusterFirst and ensure kubelet is healthy.
This works because the resolv.conf file is the standard Unix-like system mechanism for DNS resolver configuration; a correct file means the system’s DNS lookup utilities can find the DNS server.
5. Incorrect kube-proxy Configuration or Status
While kube-proxy primarily handles service routing, its absence or misconfiguration can sometimes indirectly affect DNS resolution, especially if it’s part of the overall cluster networking health.
Diagnosis:
Check the status of kube-proxy pods on each node:
kubectl get pods -n kube-system -l k8s-app=kube-proxy
Ensure all kube-proxy pods are in a Running state. Check their logs for errors.
Fix:
If kube-proxy pods are crashing or not running, investigate their logs. Common issues include incorrect configuration files or problems with the underlying node network interface. Restarting the kubelet service on the affected node might also resolve transient issues.
sudo systemctl restart kubelet
This works because kube-proxy is critical for service discovery and load balancing, and a healthy kube-proxy ensures that traffic destined for services (including the DNS service) is correctly routed.
6. Flannel/Calico/CNI Plugin Issues
The Container Network Interface (CNI) plugin is responsible for pod networking. If the CNI is not working correctly, pods might not be able to reach the kube-dns/CoreDNS service even if kubelet is configured correctly.
Diagnosis:
Check the status of your CNI daemonset pods (e.g., kube-flannel-ds, calico-node) on each node.
kubectl get pods -n kube-system -l app=flannel # For Flannel
kubectl get pods -n kube-system -l k8s-app=calico-node # For Calico
Look for any pods that are not running or have high restart counts. Check their logs.
Fix: If CNI pods are unhealthy, examine their logs for network-related errors. This could involve issues with IP address management (IPAM), routing table conflicts, or problems with the underlying network infrastructure. Reinstalling or reconfiguring the CNI plugin might be necessary, often following the specific documentation for your chosen CNI. This works by ensuring that the fundamental network fabric connecting pods and allowing them to communicate is functional, enabling reachability to the DNS service.
Once these issues are resolved, the next error you might encounter is related to pod startup delays if the cluster is under heavy load, or specific application-level errors if the application itself has further configuration issues.