The Jenkins agent went offline mid-build because the connection between the Jenkins controller and the agent process was unexpectedly terminated, preventing the agent from reporting its status or receiving further commands.

Common Causes and Fixes

  1. Network Interruption (Most Common)

    • Diagnosis: Check the agent’s network interface status and connectivity to the Jenkins controller’s IP address.
      ping <jenkins_controller_ip>
      traceroute <jenkins_controller_ip>
      
      Look for packet loss or high latency. Also, check firewall logs on both the agent and controller for dropped connections.
    • Fix: Ensure stable network infrastructure. For transient issues, configure Jenkins to automatically reconnect. On the agent’s jenkins.properties file (usually in the agent’s home directory or /etc/default/jenkins-agent), ensure jenkins.agent.disable.reconnect=false. If it’s set to true, change it to false. This allows the agent to attempt re-establishing a connection when it drops. For persistent issues, investigate upstream network devices (routers, switches) or Wi-Fi stability.
    • Why it works: This ensures the agent’s process is configured to automatically try and re-establish communication with the controller when the network link is temporarily broken and then restored.
  2. Agent Process Crash/Out of Memory

    • Diagnosis: Examine the agent’s log files for OutOfMemoryError or any stack traces indicating a crash. These are typically found in the agent’s work directory under jenkins.out or jenkins.err.
      tail -n 200 /path/to/agent/home/jenkins.out
      tail -n 200 /path/to/agent/home/jenkins.err
      
      If running as a service, check system logs like journalctl -u jenkins-agent or /var/log/syslog.
    • Fix: Increase the Java heap space allocated to the agent. Edit the agent startup script or systemd service file. For a script, find the java command and add -Xmx4096m (e.g., java -Xmx4096m ... -jar agent.jar). For a systemd service, modify the ExecStart line in /etc/systemd/system/jenkins-agent.service to include the JVM options. Restart the agent service.
    • Why it works: Providing more memory prevents the Java Virtual Machine running the agent from running out of heap space, which would cause it to crash.
  3. Agent Disk Full

    • Diagnosis: Check the available disk space on the agent machine, particularly on the partition where the agent’s work directory resides.
      df -h /path/to/agent/work_dir
      
    • Fix: Free up disk space by deleting old build artifacts, logs, or temporary files. If the issue is persistent, increase the disk size or move the work directory to a larger partition. For Jenkins agents, in the agent configuration page on the controller, under "Advanced," you can change the "Work directory" to a different path on a larger disk.
    • Why it works: The agent needs disk space to store build logs, temporary files, and downloaded SCM sources. When the disk fills up, the agent process can fail to write necessary data, leading to instability or crashes.
  4. Agent Timeout (Controller Side)

    • Diagnosis: The Jenkins controller has a timeout setting for agent connection checks. If the agent doesn’t respond within this period, the controller marks it as offline. Check the controller’s system logs for messages indicating a timed-out agent.
    • Fix: Increase the agent connection timeout on the Jenkins controller. In the Jenkins UI, go to Manage Jenkins > Configure System > Global properties > Environment variables. Add a new environment variable JENKINS_AGENT_CONNECT_TIMEOUT with a value of 60000 (milliseconds, representing 60 seconds). Restart the Jenkins controller for this to take effect.
    • Why it works: This gives the agent more time to respond to controller pings, especially in environments with higher network latency or when the agent is under heavy load and temporarily slow to respond.
  5. Agent JNLP Secret Mismatch

    • Diagnosis: If the agent uses JNLP connection and its secret (or credentials) has been rotated or changed on the controller, the agent will fail to authenticate and disconnect. Check the agent logs for authentication errors.
    • Fix: Re-provision the agent or update its secret. In Jenkins, go to Manage Jenkins > Nodes > <Agent Name> > Configure. Under "Agent credentials," click the "Edit" button next to the credential ID and update the secret, or generate a new one and update the agent’s configuration. Then, restart the agent process.
    • Why it works: The secret is used for the agent to securely identify itself to the controller. An outdated secret means the controller rejects the agent’s connection attempts.
  6. Resource Starvation on Agent Host (CPU/Memory)

    • Diagnosis: Monitor the agent host’s CPU and memory utilization during builds. High utilization (consistently above 90%) can cause the agent process to become unresponsive or be killed by the OS’s Out-Of-Memory killer.
      top -bn1 | grep "Cpu(s)" # Check CPU usage
      free -m # Check memory usage
      dmesg | grep -i "killed process" # Check for OOM killer messages
      
    • Fix: Optimize build steps to reduce resource consumption, or allocate more resources (CPU, RAM) to the agent host. This might involve upgrading the hardware, using a more powerful instance type in cloud environments, or ensuring other non-Jenkins processes are not consuming excessive resources.
    • Why it works: When the host is starved of resources, the operating system may deprioritize or terminate processes like the Jenkins agent to maintain system stability, leading to the agent appearing offline.

The next error you’ll likely encounter if the agent is still unstable is a Connection refused error when the controller attempts to re-establish contact, or a Build timed out error if the agent is still partially responsive but not completing its tasks.

Want structured learning?

Take the full Jenkins course →