K3s nodes are failing to join the cluster and kubectl get nodes shows them as NotReady.
The k3s-agent process on the worker nodes is failing to establish a connection to the K3s server, usually due to network issues or misconfiguration.
Common Causes and Fixes
-
Incorrect Server URL: The
serverURL configured on the agent is wrong, preventing it from finding the control plane.- Diagnosis: Check the
K3S_URLenvironment variable or the contents of/etc/rancher/k3s/config.yamlon the agent node. For example,sudo cat /etc/rancher/k3s/config.yamlorsudo printenv K3S_URL. - Fix: Ensure the
serverURL in/etc/rancher/k3s/config.yamlor theK3S_URLenvironment variable points to the correct IP address or hostname of the K3s server node, including the correct port (default 6443). For instance, if your server is at192.168.1.100, the URL should behttps://192.168.1.100:6443. Edit the file:sudo sed -i 's/server: https:\/\/old-ip:6443/server: https:\/\/192.168.1.100:6443/' /etc/rancher/k3s/config.yaml. - Why it works: The agent needs the precise address to know where to send its registration requests.
- Diagnosis: Check the
-
Firewall Blocking Port 6443: Network firewalls (on the nodes or in between) are blocking the essential Kubernetes API server port.
- Diagnosis: From the agent node, try to
curl https://<server_ip>:6443. If it times out or returns an error, the port is likely blocked. Usesudo ufw statuson Ubuntu orsudo firewall-cmd --list-allon CentOS/RHEL to check local firewall rules. - Fix: Open port 6443 for TCP traffic on the server node’s firewall. For
ufw:sudo ufw allow 6443/tcp. Forfirewalld:sudo firewall-cmd --zone=public --add-port=6443/tcp --permanent && sudo firewall-cmd --reload. - Why it works: Unblocking the port allows the agent to communicate with the server’s API endpoint.
- Diagnosis: From the agent node, try to
-
K3s Agent Service Not Running or Crashing: The
k3s-agentservice itself has failed to start or is repeatedly crashing due to an internal error or resource constraint.- Diagnosis: Check the status of the K3s agent service:
sudo systemctl status k3s-agent. Look for "active (running)" or "inactive (dead)" and any error messages in the output. Also, check the agent’s logs:sudo journalctl -u k3s-agent -f. - Fix: If the service is inactive, try starting it:
sudo systemctl start k3s-agent. If it’s failing, investigate thejournalctloutput for specific errors. Common fixes include insufficient memory, disk space, or corrupted configuration files. A common restart command issudo systemctl restart k3s-agent. - Why it works: Ensures the agent process is actively running and attempting to connect.
- Diagnosis: Check the status of the K3s agent service:
-
Incorrect Token: The
tokenspecified in the agent’s configuration does not match thetokenon the server, preventing authentication.- Diagnosis: Compare the token in
/etc/rancher/k3s/config.yaml(orK3S_TOKENenv var) on the agent with thetokenused when starting the K3s server (often found in/var/lib/rancher/k3s/server/node-tokenon the server, or specified via--tokenflag). - Fix: Ensure the
tokenvalue in the agent’s/etc/rancher/k3s/config.yamlexactly matches the server’s token. You can update it on the agent:sudo sed -i 's/token: <old-token>/token: <correct-token>/' /etc/rancher/k3s/config.yamland then restart the agent:sudo systemctl restart k3s-agent. - Why it works: The token is a shared secret used for the agent to authenticate itself to the server.
- Diagnosis: Compare the token in
-
Network Connectivity Issues (General): While firewalls are common, more general network problems like incorrect subnet masks, routing issues, or DNS resolution failures can also prevent communication.
- Diagnosis: From the agent node, try basic network checks:
ping <server_ip>andtraceroute <server_ip>. Ensure DNS resolution works for the server’s hostname if you’re using one:nslookup <server_hostname>. - Fix: Correct any IP addressing, subnetting, or routing misconfigurations on the agent’s network interface. If using hostnames, ensure the DNS server is correctly configured (
/etc/resolv.conf) and accessible. For example, ifpingfails, check/etc/netplan/*.yamlor/etc/sysconfig/network-scripts/ifcfg-*for IP/gateway settings. - Why it works: Establishes a reliable IP-level path for K3s traffic.
- Diagnosis: From the agent node, try basic network checks:
-
K3s Server Not Ready or Unhealthy: The K3s server itself might be experiencing issues, preventing it from accepting new agent connections.
- Diagnosis: SSH into the K3s server node and check its service status:
sudo systemctl status k3s. Also, examine its logs:sudo journalctl -u k3s -f. Check ifkubectl get nodeson the server shows the server node itself asReady. - Fix: If the K3s server is not running or showing errors, resolve those issues first. This might involve restarting the server (
sudo systemctl restart k3s), checking its configuration, or ensuring it has sufficient resources. - Why it works: A healthy control plane is a prerequisite for any agents to join.
- Diagnosis: SSH into the K3s server node and check its service status:
After resolving these, you’ll likely encounter No CNI configuration found if your CNI plugin (like Flannel) isn’t installed or properly configured on the nodes.