GitLab CI/CD’s caching mechanism is fundamentally about not doing work you’ve already done, but its dependency management is surprisingly nuanced, often leading to wasted builds that feel like cache misses when they’re actually just misconfigurations.

Let’s see it in action. Imagine you have a project that needs to install Node.js dependencies.

stages:
  - install
  - build

cache:
  key:
    files:
      - package-lock.json
  paths:
    - node_modules/

install_dependencies:
  stage: install
  script:
    - echo "Installing Node.js dependencies..."
    - npm ci --cache .npm --prefer-offline
  tags:
    - docker

build_project:
  stage: build
  script:
    - echo "Building the project..."
    - npm run build
  tags:
    - docker

In this simple setup, the install_dependencies job runs npm ci. The cache section tells GitLab to create a cache entry. The key is derived from package-lock.json. This means if package-lock.json hasn’t changed since the last successful run, GitLab will try to download a cache matching that key. If it finds one, it will extract the node_modules/ directory. The install_dependencies job then runs npm ci --cache .npm --prefer-offline. The --prefer-offline flag tells npm to use its local cache (which we’ve pointed to .npm/) before hitting the network. The build_project job then uses the node_modules that were either downloaded from the cache or installed locally.

The core problem GitLab CI/CD’s caching solves is the time and bandwidth cost of repeatedly downloading or building dependencies. By storing artifacts (like node_modules/ or compiled binaries) from one job and making them available to subsequent jobs, you drastically speed up your pipelines.

The mental model is this:

  1. Cache Key: This is the fingerprint of your cache. GitLab uses this to determine if a stored cache is still relevant. If the key changes, a new cache is created.
  2. Cache Paths: These are the directories or files that will be uploaded as the cache when a job succeeds.
  3. Cache Download: Before a job starts, GitLab checks if a cache matching the job’s cache.key exists. If it does, it downloads the cache.paths to the runner.
  4. Job Execution: The job runs. If it’s a build job, it might use the downloaded cache. If it’s an install job, it might populate the cache.
  5. Cache Upload: If the job succeeds, the directories/files specified in cache.paths are uploaded to GitLab’s cache storage, associated with the cache.key.

The key is crucial. It’s how GitLab knows when to use a cache. A common mistake is using a static key, like key: v1, which means the cache never invalidates. More often, people rely on file-based keys. The files directive takes a list of files. GitLab will generate a checksum (usually MD5) of the concatenated contents of these files and use that as the cache key. If any of those files change, the checksum changes, and thus the cache key changes, triggering a new cache upload.

Here’s where it gets powerful: the key can be more complex. You can combine static strings with file checksums or predefined CI/CD variables. For example:

cache:
  key:
    files:
      - package-lock.json
    prefix: ${CI_COMMIT_REF_SLUG} # Cache per branch
  paths:
    - node_modules/

This configuration creates a cache key that is unique per branch (CI_COMMIT_REF_SLUG) and per package-lock.json state. This is a common and good pattern.

The prefix part of the key is often overlooked. It allows you to prepend a string to the generated file checksum. This is incredibly useful for segmenting caches. For instance, using CI_COMMIT_REF_SLUG as a prefix means that the cache for your main branch will be entirely separate from the cache for your develop branch, even if package-lock.json is identical. This prevents cross-branch contamination and ensures that a change in one branch doesn’t accidentally bring in cached artifacts from another.

Another advanced use case is using job-specific caches. If you have multiple jobs that need different sets of dependencies, you can define separate cache blocks, each with its own key and paths.

stages:
  - install
  - test
  - build

cache:
  key:
    files:
      - package-lock.json
  paths:
    - node_modules/

install_node_deps:
  stage: install
  script:
    - npm ci
  tags:
    - docker

test_js:
  stage: test
  script:
    - npm test
  needs:
    - install_node_deps # Ensure dependencies are installed first
  tags:
    - docker

build_app:
  stage: build
  script:
    - npm run build
  needs:
    - install_node_deps
  tags:
    - docker

In this example, install_node_deps populates the cache. Both test_js and build_app jobs will attempt to download this cache because they share the same cache definition. The needs keyword ensures that install_node_deps runs before test_js and build_app, allowing them to benefit from the cache it creates.

The one thing most people don’t realize is that GitLab’s cache is global to the project by default unless segmented by CI_COMMIT_REF_SLUG or other prefixes. If you have multiple pipelines running concurrently for the same branch (e.g., MR pipelines), they all compete for the same cache. The first one to finish successfully will upload its cache, potentially overwriting or invalidating a cache that another concurrent pipeline was relying on. This can lead to flaky builds where one pipeline works and another fails because it downloaded an incomplete or "in-progress" cache.

Understanding and correctly configuring cache keys, prefixes, and paths is fundamental to optimizing your GitLab CI/CD pipelines and avoiding costly rebuilds.

Want structured learning?

Take the full Gitlab course →