GitHub Actions Insights is a goldmine for understanding your CI/CD performance, but most people only scratch the surface. The most surprising thing is that it’s not just about identifying slow jobs; it’s about recognizing how inter-job dependencies and workflow structure create cascading delays you’d never spot by looking at individual job runtimes.
Let’s see it in action. Imagine this simple workflow:
name: Build and Test
on: [push]
jobs:
build:
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: '3.10'
- name: Install dependencies
run: pip install -r requirements.txt
- name: Build artifact
run: python setup.py sdist bdist_wheel
test:
runs-on: ubuntu-latest
needs: build
steps:
- name: Checkout code
uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: '3.10'
- name: Install dependencies
run: pip install -r requirements.txt
- name: Run tests
run: pytest
deploy:
runs-on: ubuntu-latest
needs: test
if: github.ref == 'refs/heads/main'
steps:
- name: Checkout code
uses: actions/checkout@v4
- name: Deploy to production
run: echo "Deploying..."
When you push code, GitHub will first run the build job. Once build completes successfully, the test job will start. Only after test finishes will the deploy job (if the condition is met) execute.
Now, let’s dive into GitHub Actions Insights. Navigate to your repository, click on "Actions," and then select "Insights" from the left-hand menu. You’ll see tabs for "Workflows," "Actions," and "Runners." The "Workflows" tab is where you’ll spend most of your time analyzing performance.
Here, you can see a list of your workflows, and for each workflow, you get aggregated metrics like average duration, success rate, and the number of runs. Clicking on a specific workflow reveals more granular data:
- Average Run Duration: This is the most straightforward metric, showing the typical time from start to finish for a workflow run.
- Job Durations: Within a workflow, you can drill down into individual jobs to see their average runtimes. This is where you start spotting bottlenecks. If your
testjob consistently takes 15 minutes, whilebuildtakes 2 minutes, you know where to focus optimization efforts. - Success Rate: Crucial for understanding reliability. A low success rate might indicate flaky tests or environmental issues.
- Commit Activity: See how often a workflow is triggered.
The real power comes when you start correlating these metrics. If your "Build and Test" workflow’s average duration suddenly jumps from 10 minutes to 25 minutes, you don’t just look at the build and test jobs individually. You look at their sequence.
Consider this scenario:
buildjob takes 2 minutes.testjob takes 5 minutes.deployjob takes 1 minute.
Total workflow time: 2 + 5 + 1 = 8 minutes.
Now, what if the build job’s dependency (requirements.txt) grows, and it now takes 10 minutes?
buildjob takes 10 minutes.testjob still takes 5 minutes.deployjob still takes 1 minute.
Total workflow time: 10 + 5 + 1 = 16 minutes. The workflow duration doubled because build doubled.
But what if the test job relies on an artifact created by the build job, and the build job also needs to download a large dependency?
buildjob (downloading large dependency, creating artifact): 15 minutes.testjob (uses artifact, runs tests): 10 minutes.deployjob: 2 minutes.
Total workflow time: 15 + 10 + 2 = 27 minutes.
Now, let’s say you optimize the build job’s dependency download, reducing it to 5 minutes, but you don’t touch the test job.
buildjob: 5 minutes.testjob: 10 minutes.deployjob: 2 minutes.
Total workflow time: 5 + 10 + 2 = 17 minutes.
Even though you made the build job faster, the overall workflow performance only improved by 10 minutes (from 27 to 17), because the test job became the new bottleneck. GitHub Actions Insights will show you this aggregated workflow duration and the average duration of each job, allowing you to pinpoint this exact trade-off. You can see how a change in one job’s performance directly impacts the start time and thus the total runtime of subsequent jobs.
The most powerful lever you control is not just optimizing individual steps within a job, but strategically restructuring your workflow. Can you parallelize jobs that don’t have dependencies? Can you split a monolithic test job into smaller, independent test suites that run concurrently? Insights will show you the impact of these changes by revealing how much time is spent waiting for upstream jobs to complete, which is often a larger portion of your total runtime than the job execution itself.
The next thing you’ll want to analyze is runner utilization and how it impacts job queuing times.