Your GitHub Actions jobs are timing out because the runner, the machine executing your workflow, is not responding to the GitHub Actions control plane within the expected time limit. This usually happens when the runner gets stuck on a long-running task, loses its network connection, or runs out of resources.
Cause 1: Runner is Overloaded or Unresponsive
The runner itself might be struggling to keep up. This could be due to insufficient resources (CPU, memory, disk I/O) or a process on the runner consuming all available resources, preventing it from checking in with GitHub Actions.
Diagnosis:
Check the runner’s resource utilization. If you have self-hosted runners, SSH into the runner machine and use top or htop to see if any process is hogging CPU or memory. For GitHub-hosted runners, you can’t directly inspect them, but a consistently slow job or timeouts across different jobs suggest a general runner issue.
Fix: For self-hosted runners:
- Scale Up: If you’re self-hosting, ensure your runner machine meets or exceeds the recommended specifications for your workload. Increase RAM or CPU cores. For example, if you’re using a VM, upgrade its instance type.
- Resource Limits: If you’re using Docker or Kubernetes for self-hosted runners, ensure your runner container has adequate resource limits set, or increase them. For example, in a
docker-compose.yml, you might setdeploy: resources: limits: cpuset: "0-3" memory: 4096M. - Identify Problematic Process: If a specific process is the culprit, investigate why it’s consuming so many resources. It might be a bug in your code or a dependency.
For GitHub-hosted runners:
- Use a Larger Runner: If your workflow requires more power, switch to a larger GitHub-hosted runner type. For example, change
runs-on: ubuntu-latesttoruns-on: ubuntu-20.04and then toruns-on: ubuntu-latestwith a larger instance size if available in your plan. - Optimize Workflow: Break down long-running tasks into smaller, independent steps or jobs.
Why it works: By providing more resources or ensuring the runner isn’t bogged down, you allow it to respond promptly to the GitHub Actions control plane, preventing timeouts.
Cause 2: Network Connectivity Issues
The runner needs a stable connection to GitHub’s API endpoints to report its status. If this connection is lost or severely degraded, GitHub Actions won’t receive heartbeats from the runner, leading to a timeout.
Diagnosis: For self-hosted runners:
- Ping GitHub: From the runner machine, run
ping github.comandping api.github.comto check basic network reachability and latency. - Traceroute: Use
traceroute(ortracerton Windows) to identify any network hops with high latency or packet loss between the runner and GitHub. - Firewall/Proxy Logs: Check firewall and proxy logs for any blocked connections to
*.github.comor*.githubusercontent.comon ports 80 and 443.
For GitHub-hosted runners: This is less common but can occur due to transient network issues within GitHub’s infrastructure or your own local network if you’re using a VPN or proxy that interferes with GitHub’s services.
Fix: For self-hosted runners:
- Firewall/Proxy Rules: Ensure that your firewall or proxy allows unrestricted outbound traffic to
*.github.comand*.githubusercontent.comon ports 80 and 443. - Network Stability: Improve the network stability of the runner’s environment. This might involve upgrading network hardware, ensuring a stable internet connection, or reconfiguring your network.
- DNS Resolution: Verify that the runner can reliably resolve
api.github.comand other GitHub domains. Trynslookup api.github.com.
For GitHub-hosted runners:
- Retry the Workflow: Often, transient network issues resolve themselves. Rerun the workflow.
- Check GitHub Status: Visit status.github.com to see if there are ongoing incidents impacting GitHub Actions.
Why it works: A stable network connection ensures the runner can continuously communicate with GitHub’s control plane, allowing it to send heartbeats and receive commands, thus preventing premature timeouts.
Cause 3: Runner Process Crashed or Was Terminated
The runner application itself might have crashed due to an unhandled exception, a bug, or an external process (like an OOM killer on Linux) terminating it.
Diagnosis: For self-hosted runners:
- Runner Logs: Check the logs of the GitHub Actions runner service. The exact location depends on your installation, but it’s often in
/actions-runner/_diagor a systemd journal. Look for error messages or stack traces. - System Logs: Examine system logs (
/var/log/syslog,dmesg, Windows Event Viewer) for any indications of the runner process being killed or crashing.
For GitHub-hosted runners: GitHub Actions automatically logs errors if the runner process crashes. You’ll typically see a specific error message in the workflow run output indicating the runner environment encountered an issue.
Fix: For self-hosted runners:
- Update Runner Software: Ensure you are running the latest version of the GitHub Actions runner software. Bugs that cause crashes are often fixed in newer releases.
- Environment Stability: If the runner is running on a system with other applications, ensure those applications aren’t interfering or causing instability.
- Resource Allocation: If the runner process is being killed by the OS (e.g., OOM killer), it points back to resource exhaustion (Cause 1). Increase RAM or tune system memory management.
For GitHub-hosted runners:
- Report to GitHub: If you suspect a bug in GitHub-hosted runners, you can report it through GitHub Support.
Why it works: Ensuring the runner application is stable and running correctly allows it to maintain its connection and report its status, preventing timeouts.
Cause 4: Long-Running, Non-Responsive Step
A specific step within your workflow might be stuck in an infinite loop, a deadlock, or a very long computation that doesn’t produce any output or check-ins that GitHub Actions can detect.
Diagnosis:
- Analyze Workflow Logs: Examine the logs for the job that timed out. Look at the last few steps executed. If a step hangs indefinitely without completing or showing progress, that’s your culprit.
- Add Debugging Output: Temporarily add
echocommands or useset -x(for bash) at the beginning of your script steps to get more granular output and pinpoint where it stalls.
Fix:
- Optimize the Step: Refactor the code or command within that specific step to be more efficient.
- Break Down the Step: If a step performs a complex operation, try to break it down into smaller, sequential steps.
- Add Heartbeat Output: If possible, modify the script to periodically print output (e.g.,
echo "Still working..."every 30 seconds) to signal to GitHub Actions that the step is still alive. - Increase Timeout: As a last resort for legitimate long-running tasks, you can increase the job timeout. For repository-level timeouts, you can set
options: { workflow_job: { timeout: 360 } }in your workflow file (where 360 is in minutes). This is often a workaround, not a fix for the underlying issue.
Why it works: By making the step more efficient, breaking it down, or ensuring it periodically signals activity, you prevent it from appearing stalled to the GitHub Actions control plane.
Cause 5: GitHub Actions Service Interruption
While less common, there might be temporary issues with the GitHub Actions service itself that prevent runners from reporting in or the control plane from receiving those reports.
Diagnosis:
- Check GitHub Status Page: Visit status.github.com to see if there are any reported incidents affecting GitHub Actions.
Fix:
- Wait and Retry: If there’s an ongoing incident, the best course of action is to wait for GitHub to resolve the issue and then retry your workflow.
- Contact GitHub Support: If you suspect an issue that isn’t publicly reported, open a ticket with GitHub Support.
Why it works: This is an external issue that you cannot directly fix. Waiting for GitHub to resolve it is the only recourse.
Cause 6: Runner Stuck in a Loop Waiting for External Resource
Your job might be waiting for a response from an external service (e.g., an API, a database, a CI/CD tool) that is itself slow, unresponsive, or has timed out. The runner is technically "working" by waiting, but it’s not actively doing computation or producing output that would signal liveness.
Diagnosis:
- Examine Logs: Look for steps that involve external API calls, database queries, or interactions with other services.
- Add Timeouts to External Calls: If your script makes external calls, ensure they have appropriate timeouts configured. For example,
curl --connect-timeout 10 --max-time 60 https://example.com/api.
Fix:
- Improve External Service Performance: Address performance issues with the external service if possible.
- Implement Timeouts: Add explicit timeouts to your external service calls within your workflow scripts.
- Add Fallback or Retry Logic: Implement retry mechanisms with exponential backoff for transient network issues or service unavailability.
Why it works: By ensuring external calls don’t hang indefinitely and by handling temporary unavailability gracefully, you prevent the runner from getting stuck in a waiting state that leads to a timeout.
After fixing these, the next error you’ll likely encounter is a "Job not found" error if the runner process is completely gone or if the runner registration token has expired.