A GitLab CI DAG pipeline doesn’t actually run jobs in parallel; it selectively runs jobs in serial, only when their dependencies are met.
Let’s watch this GitLab CI pipeline in action. Imagine we have a project with a frontend, a backend, and a deployment stage.
stages:
- build
- test
- deploy
build_frontend:
stage: build
script:
- echo "Building frontend..."
- sleep 5 # Simulate build time
- echo "Frontend built."
artifacts:
paths:
- frontend/build/
build_backend:
stage: build
script:
- echo "Building backend..."
- sleep 5 # Simulate build time
- echo "Backend built."
artifacts:
paths:
- backend/build/
test_frontend:
stage: test
needs:
- build_frontend
script:
- echo "Testing frontend..."
- sleep 3 # Simulate test time
- echo "Frontend tested."
test_backend:
stage: test
needs:
- build_backend
script:
- echo "Testing backend..."
- sleep 3 # Simulate test time
- echo "Backend tested."
deploy_app:
stage: deploy
needs:
- test_frontend
- test_backend
script:
- echo "Deploying application..."
- sleep 10 # Simulate deploy time
- echo "Application deployed."
In this setup, build_frontend and build_backend can run at the same time because they are in the build stage and have no dependencies on each other. Once both build_frontend and build_backend complete, test_frontend and test_backend can start. They, too, can run in parallel. Only after both test_frontend and test_backend are successful can deploy_app begin. This creates a Directed Acyclic Graph (DAG) of dependencies, where jobs only execute when their predecessors in the graph have finished.
The problem this solves is the "waterfall" CI/CD pipeline where every job must wait for the previous one, even if they are logically independent. This is inefficient because many jobs could be running concurrently. By using needs, we explicitly define these dependencies, allowing GitLab CI to intelligently schedule jobs. Instead of a rigid, sequential flow, we get a more dynamic execution order.
Internally, GitLab CI uses a job queue and a dependency graph to manage this. When a pipeline starts, it identifies jobs that have no needs dependencies (or whose needs dependencies are met by previously completed jobs). These jobs are then sent to runners. As jobs complete, GitLab CI checks which other jobs now have all their needs dependencies satisfied and adds them to the queue. This ensures that deploy_app never starts until both test_frontend and test_backend are green.
The key levers you control are the needs keyword and the stage. The stage defines the overall order (all build jobs before any test jobs, etc.), but needs refines this to a much more granular level. You can specify multiple jobs in needs, meaning a job will only run after all of them have completed successfully. You can also use needs to pull artifacts from jobs in earlier stages, allowing you to break down monolithic build jobs. For example, test_frontend doesn’t need to know about build_backend; it only cares that build_frontend is done.
A common misunderstanding is that needs implies parallel execution. It doesn’t. needs defines when a job can start. Parallelism is determined by the runner’s capacity and how many jobs are ready to run simultaneously because their dependencies are met. A job with needs can still be the only job ready to run in its stage, and it will execute serially. The power comes from having multiple jobs simultaneously fulfilling their needs and becoming ready to run.
The next concept you’ll likely explore is using needs with trigger jobs to orchestrate multi-project pipelines, creating even more complex and efficient workflows.