Git’s ability to track changes over time is what makes it so powerful, but that history can also become a performance bottleneck if not managed.

Let’s see Git’s maintenance in action. Imagine you’ve been working on a project, frequently committing small changes, maybe rebasing often, and perhaps even deleting and recreating branches. This activity, while normal, can lead to a fragmented repository.

# First, let's look at the current state before any maintenance
git count-objects -vH

# This might show a large number of loose objects and delta chains

# Now, let's perform a basic garbage collection
git gc

# After gc, let's check again
git count-objects -vH

You’ll likely observe a reduction in loose objects and a more consolidated packfile. This is Git reorganizing its internal storage to be more efficient.

The core problem Git maintenance solves is repository bloat and fragmentation. Over time, Git stores objects (commits, trees, blobs) individually. When you delete branches, amend commits, or perform other history-rewriting operations, older versions of these objects might linger. This leads to:

  • Increased disk space usage: More objects mean more storage.
  • Slower operations: Git has to sift through more data to find what it needs for commands like git log, git blame, or even fetching.
  • Larger network transfers: When cloning or fetching, Git sends compressed history, and a bloated history means larger downloads.

git gc (garbage collection) is the primary command for this. It performs several crucial tasks:

  1. Consolidates loose objects into packfiles: Git stores new objects initially as "loose objects." git gc gathers these loose objects and compresses them into efficient "packfiles," which are single files containing many objects. This reduces the number of files Git needs to manage.
  2. Optimizes packfiles: It can repack existing packfiles, identifying redundant data and creating delta compression. Delta compression stores objects as differences (deltas) from a base object, saving significant space.
  3. Removes unreachable objects: Objects that are no longer referenced by any branch, tag, or other reachable commit are marked for deletion. git gc purges these.

Common Causes of Repository Bloat and Their Fixes:

  • Excessive loose objects: This happens after many commits without a gc run.

    • Diagnosis: git count-objects -vH will show a high number of "loose objects."
    • Fix: Run git gc. This command will repack these loose objects into a packfile.
    • Why it works: Packfiles are a more efficient storage format than individual loose objects, reducing file I/O.
  • Large binary files committed unintentionally: Even after deletion, Git retains history.

    • Diagnosis: Use git rev-list --objects --all | git cat-file --batch-check='%(objecttype) %(objectname) %(objectsize) %(committer)' | sed -e 's/^blob //; /^[0-9a-f]\{40\}/!d; s/ .*//' | xargs -0 -n1 -i sh -c 'echo -n "{} " && git cat-file -s "{}"' | sort -n -r | head -n 20 to find the largest objects. Or use tools like git-sizer.
    • Fix: Use git filter-repo (preferred over bfg or filter-branch) to rewrite history and remove the large files. For example, to remove a file named large_binary.zip:
      git filter-repo --path large_binary.zip --invert-paths
      
      Then, run git gc --aggressive --prune=now.
    • Why it works: filter-repo rewrites commit history, and gc cleans up the now-unreachable old objects containing the large file.
  • Numerous small, frequent commits without consolidation: While good for granular tracking, too many can fragment the object database over time.

    • Diagnosis: git log --oneline --graph --decorate might show a very dense history. git count-objects -vH might show many small delta chains.
    • Fix: Periodically use git rebase -i to squash or reword commits. Follow this with git gc.
    • Why it works: Squashing combines multiple commits into one, reducing the number of objects and simplifying the history graph.
  • Stale .git/objects/pack files: Sometimes, older, less efficient packfiles can be left behind.

    • Diagnosis: git gc --prune=now will usually address this. If you suspect issues, examine the contents of .git/objects/pack/.
    • Fix: git gc --prune=now removes all objects that are no longer referenced by any packfile or loose object, and then repacks.
    • Why it works: This ensures Git is only using the most optimized packfiles and removing cruft.
  • Accidental commits to large files that were later removed: Similar to large binary files, but might be transient.

    • Diagnosis: Use git rev-list --all --objects | git cat-file --batch-check='%(objecttype) %(objectname) %(objectsize) %(committer)' | sort -k3 -nr | head -n 10 to find large objects.
    • Fix: Use git filter-repo to remove the commits that introduced the large file, then run git gc --aggressive --prune=now.
    • Why it works: Rewriting history removes the commit containing the large file, and gc cleans up the associated objects.
  • Forgetting to run git gc regularly: Many developers don’t realize gc isn’t always run automatically or frequently enough.

    • Diagnosis: git count-objects -vH shows a large number of loose objects or many packfiles.
    • Fix: Schedule git gc to run periodically, perhaps via a Git hook or a cron job (e.g., git gc --auto which runs if certain conditions are met, or git gc for a full run).
    • Why it works: Regular garbage collection keeps the repository in an optimized state, preventing significant performance degradation.

The git gc command, especially with --aggressive and --prune=now, can be quite resource-intensive. It’s often best run during off-peak hours or on a repository that isn’t actively being worked on by many people simultaneously.

After performing significant history rewriting with git filter-repo or similar tools, you’ll often need to run git gc --aggressive --prune=now to fully clean up the old, unreachable objects that were part of the previous history.

The next challenge you’ll likely face is understanding how to configure the garbage collection behavior to run automatically and efficiently for your team’s workflow.

Want structured learning?

Take the full Git course →