The K3s control plane components are failing to become healthy because the embedded etcd datastore is not starting up quickly enough, leading to timeouts in the Kubernetes API server and controller manager.

The most common culprit is insufficient resources allocated to the K3s systemd service. K3s, particularly when running etcd, needs a minimum amount of CPU and memory to initialize its components and establish quorum. If the systemd unit file doesn’t explicitly grant these, the operating system’s scheduler might deprioritize the K3s processes, causing them to lag.

Diagnosis: Check the systemd service status and logs for K3s.

sudo systemctl status k3s
sudo journalctl -u k3s -f

Look for messages indicating etcd startup delays or API server failing to connect to etcd. You might see etcdserver: failed to start wal file or etcdserver: leader changed repeatedly.

Fix: Increase the CPUQuota and MemoryQuota for the K3s systemd service. Edit the K3s systemd service file, typically located at /etc/systemd/system/k3s.service. You’ll likely need to create an override file to manage these settings without directly modifying the main K3s service unit.

sudo systemctl edit k3s.service

Add the following to the override file:

[Service]
CPUQuota=200%
MemoryQuota=2G

Then, reload systemd and restart K3s:

sudo systemctl daemon-reload
sudo systemctl restart k3s

Why it works: This directly tells the systemd manager to reserve at least 200% of one CPU core and 2GB of RAM for the K3s processes. This ensures etcd and the API server have the necessary resources to start quickly and establish a healthy cluster state, preventing the etcdserver: request timed out errors.

Another frequent cause is a misconfigured or absent node-ip flag, especially on systems with multiple network interfaces. K3s needs to know which IP address to bind its services to, and if it guesses incorrectly or tries to bind to an unavailable interface, etcd’s peer communication can fail.

Diagnosis: Inspect the K3s configuration file or command-line arguments.

sudo cat /etc/rancher/k3s/config.yaml
# Or, if not using a config file, check the systemd service definition for arguments
sudo systemctl cat k3s.service

Look for the node-ip parameter or --node-ip flag.

Fix: Explicitly set the node-ip in /etc/rancher/k3s/config.yaml or as a systemd service argument to the correct IP address of the node.

# /etc/rancher/k3s/config.yaml
node-ip: 192.168.1.100

Or as a systemd argument:

# In /etc/systemd/system/k3s.service.d/override.conf
# [Service]
# ExecStart=/usr/local/bin/k3s server --node-ip 192.168.1.100

Then restart K3s:

sudo systemctl restart k3s

Why it works: This ensures K3s binds its etcd and API server ports to the correct, reachable network interface, allowing peer discovery and client connections to succeed without network-level timeouts.

Network configuration issues, particularly firewall rules or SELinux/AppArmor policies, can block the necessary communication ports for etcd. Etcd relies on specific ports (2379 for client, 2380 for peer) to function.

Diagnosis: Check firewall status and audit logs.

sudo ufw status verbose # if using UFW
sudo firewall-cmd --list-all # if using firewalld
sudo ausearch -m avc -ts recent # SELinux audit logs
sudo aa-status # AppArmor status

Look for DENY messages related to ports 2379, 2380, or the K3s binary itself.

Fix: Open the required ports in the firewall and adjust security policies. For UFW:

sudo ufw allow 2379/tcp
sudo ufw allow 2380/tcp
sudo ufw reload

For firewalld:

sudo firewall-cmd --add-port=2379/tcp --permanent
sudo firewall-cmd --add-port=2380/tcp --permanent
sudo firewall-cmd --reload

For SELinux, you might need to allow K3s to bind to specific ports. A common workaround is to relabel the K3s executables or adjust policies, but often a simpler approach for etcd ports is to ensure they are allowed. If SELinux is the issue, you’ll see denied messages in ausearch. Why it works: This explicitly permits network traffic on the ports etcd uses for client requests and inter-node communication, resolving connection refused or timeout errors that arise from network segmentation.

A corrupted etcd data directory can prevent etcd from starting correctly. This is less common but can happen due to disk issues or improper shutdowns.

Diagnosis: Examine the etcd data directory for errors or inconsistencies.

sudo ls -l /var/lib/rancher/k3s/server/db/

Look for unusual file sizes, permissions, or recent modification timestamps that don’t align with expected operations. Check journalctl -u k3s for etcdserver: mvcc: corrupted wal or similar corruption messages.

Fix: If corruption is suspected and you don’t have a critical need for the existing state, the simplest fix is to reset the etcd data directory. WARNING: This will destroy all cluster state.

sudo systemctl stop k3s
sudo rm -rf /var/lib/rancher/k3s/server/db/
sudo systemctl start k3s

If you need to recover, you would typically attempt an etcd snapshot restore, which is a more complex operation. Why it works: By removing the corrupted data, K3s will initialize a fresh etcd datastore on startup, allowing the control plane to come online cleanly.

Insufficient disk space on the partition where /var/lib/rancher/k3s/server/db/ resides will prevent etcd from writing its WAL (Write-Ahead Log) and snapshot files, leading to startup failures.

Diagnosis: Check available disk space on the relevant partition.

df -h /var/lib/rancher/k3s/server/db/

Look for partitions that are at or near 100% usage.

Fix: Free up disk space or move the K3s data directory to a partition with more space. To free space, identify and remove large, unnecessary files or prune old Docker images/volumes if K3s is using Docker. If moving the directory is necessary, stop K3s, move the directory, and update the DataDir setting in /etc/rancher/k3s/config.yaml.

# Example: move to /mnt/ssd/k3s-data
sudo systemctl stop k3s
sudo mv /var/lib/rancher/k3s/server/db/ /mnt/ssd/k3s-data
# Update config.yaml
# DataDir: /mnt/ssd/k3s-data
sudo systemctl start k3s

Why it works: Etcd requires contiguous disk space to reliably write its transaction logs and snapshots. Ensuring sufficient free space prevents I/O errors and data corruption that would halt etcd’s initialization.

Finally, the K3s version itself might have a known bug related to startup performance or etcd initialization under specific conditions.

Diagnosis: Check the K3s version and consult release notes.

k3s --version

Visit the K3s GitHub releases page and search for issues related to startup delays or etcd in your version or recent versions.

Fix: Upgrade to a newer, stable K3s version. Follow the official upgrade guide for K3s. Typically involves downloading the new binary and restarting the service.

# Example for upgrading from binary
curl -sfL https://get.k3s.io | sh -s - --version v1.28.5+k3s1 # Replace with desired version
sudo systemctl restart k3s

Why it works: Later versions often contain bug fixes and performance improvements that resolve underlying issues preventing timely startup of critical control plane components.

The next error you’ll likely encounter if you fix all these is a Failed to connect to API server error from kubectl, indicating that while K3s might be starting, the Kubernetes API is not yet ready to accept external client connections, possibly due to lingering controller manager delays or network configuration on the client side.

Want structured learning?

Take the full K3s course →