GitLab CI YAML pipelines are declarative definitions of your continuous integration and deployment process, but their syntax can feel like a labyrinth of nested structures and special keywords.
Let’s watch a pipeline in action. Imagine you have a .gitlab-ci.yml file like this:
stages:
- build
- test
- deploy
build_app:
stage: build
script:
- echo "Building the application..."
- make build
artifacts:
paths:
- build/
test_app:
stage: test
needs: [build_app]
script:
- echo "Running tests..."
- make test
dependencies:
- build_app
deploy_to_staging:
stage: deploy
needs: [test_app]
script:
- echo "Deploying to staging..."
- ./deploy.sh staging
when: manual
only:
- main
When a commit is pushed to the main branch, GitLab Runner picks up this configuration. It sees the stages defined: build, test, and deploy. It then looks for jobs associated with each stage. The build_app job, belonging to the build stage, will execute first. Its script commands will run in a fresh environment. If the make build command succeeds, any files in the build/ directory are uploaded as artifacts and will be available for subsequent jobs.
Next, the test_app job, in the test stage, will run. The needs: [build_app] directive tells GitLab to wait for build_app to complete successfully. Crucially, dependencies: - build_app ensures that the artifacts from build_app are downloaded to the runner before test_app’s script executes. This allows the tests to operate on the built application.
Finally, the deploy_to_staging job, in the deploy stage, waits for test_app to finish. However, its when: manual means it won’t run automatically. A user must explicitly click the "play" button in the GitLab UI. The only: - main clause restricts this job to only run on commits to the main branch.
The core problem GitLab CI YAML solves is codifying complex, multi-step software delivery workflows into a single, version-controlled file. It allows you to automate building, testing, and deploying your applications reliably and repeatably. Internally, GitLab CI uses a system of jobs, stages, and dependencies. Jobs are the fundamental units of work, defined with a script that executes commands. Jobs are grouped into stages, which define the execution order: jobs in earlier stages must succeed before jobs in later stages can begin. dependencies and needs manage the flow of data (artifacts) and execution order between jobs.
The script keyword is the heart of any job, accepting a list of shell commands to be executed. Each command is run sequentially in a dedicated job environment. The artifacts keyword allows you to persist files generated by a job, such as compiled binaries or test reports, making them accessible to downstream jobs. cache is similar but for speeding up subsequent pipeline runs by storing frequently used files (like dependencies) between jobs. variables allow you to define environment variables for your jobs, useful for configuration or secrets. rules and only/except provide fine-grained control over when jobs are executed, based on branch names, tags, commit messages, or other conditions.
The needs keyword is a more flexible alternative to dependencies. While dependencies implies a strict stage-by-stage progression and artifact download, needs allows for more complex, non-linear pipeline graphs. A job with needs can depend on any number of jobs from any previous stage, not just the immediately preceding one, and you can selectively download artifacts from those dependencies. This unlocks more efficient pipelines where jobs can run in parallel as soon as their specific prerequisites are met, rather than waiting for an entire stage to finish.
The most surprising thing about GitLab CI’s rules is how they can be used to dynamically control job execution without resorting to complex scripting within the script block itself. You can define conditions based on the presence of specific files, the value of CI variables, or even whether a pipeline is a manual trigger. For instance, a rule like changes: ["src/**"] will cause a job to only run if files within the src/ directory have changed in the commit, drastically optimizing pipelines for projects with distinct modules.
The next concept you’ll likely grapple with is optimizing pipeline performance through advanced caching strategies and parallel job execution.