Jenkins is failing to connect to one or more of its agent nodes, preventing builds from running.

Common Causes and Fixes for Jenkins Node Offline Errors:

1. SSH Connection Issues:

  • Diagnosis: On the Jenkins controller, run ssh -vvv <jenkins-agent-user>@<agent-hostname-or-ip> to get verbose output. Look for authentication failures, permission denied messages, or connection refused.
  • Cause: The SSH daemon on the agent is not running, the Jenkins controller’s SSH key is not authorized on the agent, or firewall rules are blocking the connection.
  • Fix:
    • SSH Daemon: On the agent, run sudo systemctl status sshd. If not active, start it with sudo systemctl start sshd and enable it to start on boot with sudo systemctl enable sshd.
    • Authorized Keys: On the agent, ensure the Jenkins controller’s public SSH key (typically ~/.ssh/id_rsa.pub on the controller) is present in the ~/.ssh/authorized_keys file for the user Jenkins connects as (e.g., /home/jenkins/.ssh/authorized_keys). If not, append it.
    • Firewall: On the agent, check firewall rules. If using ufw, run sudo ufw status. If SSH (port 22) is blocked, allow it with sudo ufw allow ssh or sudo ufw allow 22/tcp.
  • Why it works: SSH is the primary protocol for Jenkins to communicate with its agents. If the SSH server isn’t running, the keys aren’t authorized, or the port is blocked, the connection will fail at the network layer.

2. Agent Process Not Running or Crashing:

  • Diagnosis: Log into the agent machine and check for the jenkins-agent (or jenkins.jar if running manually) process. Use ps aux | grep jenkins or pgrep -fl jenkins.
  • Cause: The agent process crashed due to an out-of-memory error, an unhandled exception, or was manually stopped.
  • Fix:
    • Restart Agent: If the process is not running, start it. If it’s managed by systemd, use sudo systemctl start jenkins-agent. If running manually, restart the java -jar agent.jar ... command.
    • Check Logs: Examine the agent’s log file (e.g., /var/log/jenkins/jenkins.log or the output if run manually) for stack traces or error messages indicating why it might have crashed. Address the underlying issue (e.g., increase JVM heap size if OOM).
  • Why it works: The Jenkins agent process is the daemon that listens for commands from the controller and executes build tasks. If it’s not running, the controller has no endpoint to connect to.

3. Network Connectivity / DNS Issues:

  • Diagnosis: From the Jenkins controller, ping <agent-hostname-or-ip> and telnet <agent-hostname-or-ip> 22. From the agent, ping <jenkins-controller-hostname-or-ip>.
  • Cause: The agent machine cannot resolve the controller’s hostname, or vice-versa. Network routing issues exist, or a network device (router, switch, firewall) is dropping packets.
  • Fix:
    • DNS: Ensure /etc/resolv.conf on both controller and agent point to valid DNS servers. Verify hostnames resolve correctly using dig <hostname> or nslookup <hostname>.
    • Routing/Firewall: If ping/telnet fails, investigate network infrastructure. Ensure no intermediate firewalls are blocking traffic on port 22 (SSH) or the agent’s configured port if using JNLP.
  • Why it works: Basic network reachability is fundamental. If hosts can’t find each other by name or IP, or if the necessary ports are blocked, communication fails.

4. Incorrect Agent Configuration in Jenkins:

  • Diagnosis: Navigate to "Manage Jenkins" -> "Nodes" -> <Your Agent Node Name> on the Jenkins controller. Review the "Host" and "Port" (if applicable) settings.
  • Cause: The hostname or IP address configured for the agent in Jenkins is incorrect, or the SSH port is wrong (e.g., if the agent’s SSH server runs on a non-standard port).
  • Fix: Update the "Host" field to the correct IP address or FQDN of the agent. If the agent’s SSH daemon listens on a non-standard port (e.g., 2222), ensure the "Port" field in the Jenkins node configuration matches.
  • Why it works: Jenkins uses these exact details to establish the connection. Typos or outdated information directly lead to connection failures.

5. Agent Launch Method Issues (JNLP/WebSockets):

  • Diagnosis: Check the agent logs for errors related to JNLP handshake, WebSocket connection, or security exceptions. On the controller, check "Manage Jenkins" -> "System" for "Agents" connection settings.
  • Cause: If using JNLP (Java Network Launch Protocol) or WebSockets for agent connection, the controller’s configured JNLP secret might be invalid, the agent might be unable to reach the controller’s JNLP port (often 50000 or dynamically assigned), or a proxy is interfering.
  • Fix:
    • JNLP Secret: Re-save the agent configuration in Jenkins. This often regenerates the JNLP secret.
    • Agent Command: Ensure the agent is launched with the correct secret from the controller’s UI (e.g., java -jar agent.jar -jnlpUrl <controller-url>/computer/<agent-name>/jenkins-agent.jnlp -secret <secret-from-ui>).
    • Port: Verify the JNLP port (default 50000) is open on the controller and accessible from the agent.
  • Why it works: JNLP/WebSockets are alternative communication methods. They rely on specific ports and secrets for authentication and data transfer. Misconfigurations here break the agent’s ability to "check in" with the controller.

6. Resource Exhaustion on Agent or Controller:

  • Diagnosis: Monitor CPU, memory, and disk I/O on both the agent and controller. Use top, htop, vmstat, iostat on Linux/macOS, or Task Manager/Resource Monitor on Windows.
  • Cause: The agent machine has run out of memory or CPU, preventing the agent process from responding. Similarly, the controller might be overloaded, unable to manage its agents effectively.
  • Fix:
    • Agent: Free up resources by stopping unnecessary processes. Increase RAM or CPU allocated to the agent VM/server.
    • Controller: Optimize Jenkins jobs, increase controller resources, or distribute the load by adding more controllers or agents.
  • Why it works: When systems are starved of resources, even essential services like the Jenkins agent process can become unresponsive or crash, leading to an "offline" status.

The next error you’ll likely encounter after resolving these is a "Build Step Failed" error because the build agent is still not properly configured to execute the specific build commands.

Want structured learning?

Take the full Jenkins course →