GitLab CI pipelines in large repositories often become a bottleneck, not because of the jobs themselves, but because the Git fetch operation at the start of each job is taking an eternity.

Let’s see a pipeline in action. Imagine you have a monorepo with hundreds of microservices. Your .gitlab-ci.yml looks something like this:

stages:
  - build
  - test
  - deploy

build_service_a:
  stage: build
  script:
    - echo "Building service A..."
    - ./build.sh service_a
  rules:
    - if: '$CI_COMMIT_BRANCH == "main"'

test_service_a:
  stage: test
  script:
    - echo "Testing service A..."
    - ./test.sh service_a
  rules:
    - if: '$CI_COMMIT_BRANCH == "main"'

build_service_b:
  stage: build
  script:
    - echo "Building service B..."
    - ./build.sh service_b
  rules:
    - if: '$CI_COMMIT_BRANCH == "main"'

test_service_b:
  stage: test
  script:
    - echo "Testing service B..."
    - ./test.sh service_b
  rules:
    - if: '$CI_COMMIT_BRANCH == "main"'

When this pipeline runs, each job, from build_service_a to test_service_b, will first perform a git fetch --depth=1. In a massive repository, this single operation can take minutes, easily doubling or tripling your pipeline’s total execution time. The actual build or test commands are trivial in comparison.

The core problem is that GitLab CI, by default, clones the entire repository history (or a shallow clone of the latest commit) for every single job. When your repository has hundreds of thousands of commits and millions of files, this becomes incredibly inefficient. Jobs don’t need the entire history; they often only need the changes relevant to their specific task or service.

Here’s how to tackle this:

1. Optimize Git Fetch Depth: The default GIT_DEPTH is usually 50. For very large repos, this can still be too much.

  • Diagnosis: Look at the job logs. The Fetching changes with git depth set to... line will show the depth.
  • Fix: Set GIT_DEPTH to a very small number, like 1 or 5.
    variables:
      GIT_DEPTH: 5 # or 1
    
  • Why it works: Reduces the amount of history Git needs to download, speeding up the initial checkout.

2. Utilize GIT_STRATEGY: clone with GIT_SUBMODULE_STRATEGY: none (if you don’t use submodules): If your jobs don’t need the full history or submodules, explicitly telling Git to clone and not fetch submodules can sometimes be faster.

  • Diagnosis: Check if your jobs git checkout commands are the longest part of the execution.
  • Fix:
    variables:
      GIT_STRATEGY: clone
      GIT_SUBMODULE_STRATEGY: none
    
  • Why it works: clone bypasses some of the fetch logic and directly gets the working copy, and explicitly disabling submodules avoids unnecessary submodule initialization if they aren’t used.

3. Selective Fetching with GIT_FETCH_EXTRA_CLONE_FLAGS: You can tell Git to only fetch specific branches or tags.

  • Diagnosis: You only care about changes on main or specific release tags.
  • Fix:
    variables:
      GIT_FETCH_EXTRA_CLONE_FLAGS: --branch main --single-branch
    
  • Why it works: Instructs Git to only fetch the history for the specified branch, dramatically reducing download size.

4. Use a Git Server Mirror: If you have a dedicated Git server, you can configure GitLab Runner to use a local mirror.

  • Diagnosis: Your runners are far from your main Git server, or you have extremely high network latency.
  • Fix: Configure your GitLab Runner to point to a local Git server mirror. This is a more advanced setup involving Prometheus and a local Git daemon. Consult GitLab’s documentation for detailed setup.
  • Why it works: Reduces network latency by fetching from a local source instead of a remote one.

5. Artifacts for Dependencies (if applicable): If your services depend on build artifacts from other services, use GitLab’s artifact system.

  • Diagnosis: Job A builds something, and Job B needs that output. Instead of Job B re-fetching the repo and rebuilding, it downloads the artifact.
  • Fix:
    build_service_a:
      stage: build
      script:
        - ./build.sh service_a
      artifacts:
        paths:
          - build/service_a/
        expire_in: 1 week
    
    test_service_b:
      stage: test
      script:
        - ./test.sh service_b --dependency build/service_a/
      dependencies:
        - build_service_a
    
  • Why it works: Downloads only the necessary built artifacts, avoiding a full repo checkout for dependent jobs.

6. Sparse Checkout: For truly massive monorepos, sparse-checkout can be a lifesaver. It tells Git to only check out specific directories within the repository.

  • Diagnosis: You only need files for service_a and service_b, but the repo has 1000 services.
  • Fix:
    script:
      - git config core.sparseCheckout true
      - echo "service_a/" >> .git/info/sparse-checkout
      - echo "service_b/" >> .git/info/sparse-checkout
      - git checkout
      # your build/test commands follow
    
    You might need to adjust the GIT_DEPTH and GIT_STRATEGY variables in conjunction with this.
  • Why it works: Git only downloads and makes available the files you explicitly specify, drastically reducing the size of the working directory and the time to checkout.

7. Runner Configuration (config.toml): Globally set Git strategies and depths for your runners.

  • Diagnosis: You want these optimizations applied to all jobs on a specific set of runners.
  • Fix: In your config.toml file on the runner:
    [runners.git]
      depth = 5
      strategy = "clone"
      submodule_strategy = "none"
    
  • Why it works: Enforces these settings at the runner level, reducing the need to specify them in every .gitlab-ci.yml file.

The next hurdle you’ll likely encounter is dealing with large Docker image builds, which have their own set of optimization strategies like layer caching and multi-stage builds.

Want structured learning?

Take the full Gitlab-ci course →