Git’s pack files are the unsung heroes of efficient repository management, but the real magic isn’t just compression; it’s how they represent history as a series of deltas, allowing Git to reconstruct any commit by referencing only a fraction of the data.

Let’s peek inside a typical Git repository. Navigate to your .git/objects directory. You’ll see a mix of single-character directories and files. These are your loose objects, individual Git objects (blobs, trees, commits, tags) stored independently. When you run commands like git gc (garbage collection), Git consolidates these loose objects into pack files, typically found in .git/objects/pack.

Here’s a simplified view of a repository’s objects being packed. Imagine we have three commits:

  • Commit A: Contains file1.txt with "Hello".
  • Commit B: Modifies file1.txt to "Hello, World!".
  • Commit C: Adds file2.txt with "Another file" and modifies file1.txt to "Hello, Git!".

Without packing, Git would store three separate blob objects for file1.txt (one for each version) and one blob for file2.txt, along with their respective tree and commit objects.

When Git packs these, it doesn’t just gzip everything. It identifies similarities.

  1. Finding Base Objects: Git first identifies a set of "base" objects. These are typically objects that haven’t changed recently or are referenced by many other objects.
  2. Delta Compression: For objects that have changed, Git looks for the closest preceding version (or even a completely different object with similar content) and stores only the differences (the delta) between them.
    • The blob for file1.txt in Commit B is stored as a delta against the blob in Commit A. Git records instructions like "take blob A, insert ’ World!’ at offset 5".
    • The blob for file1.txt in Commit C is stored as a delta against the blob in Commit B (or potentially Commit A, depending on Git’s heuristics). Git records "take blob B, replace 'World!' with ’ Git!'".
    • The blob for file2.txt in Commit C might be stored as a delta against another existing blob if it’s similar, or as a full object if it’s unique.
  3. Reconstruction: When Git needs to access an object that’s been delta-compressed, it finds the base object (which is stored fully or as a delta from another object, recursively) and applies the delta instructions to reconstruct the desired object.

Let’s see this in action. Suppose you have a file that grows over time.

# Initial setup
mkdir git-pack-demo
cd git-pack-demo
git init

# Commit 1: Create a file
echo "Initial content" > my_document.txt
git add my_document.txt
git commit -m "Add my_document.txt"

# Commit 2: Append to the file
echo "More content added" >> my_document.txt
git add my_document.txt
git commit -m "Append more content"

# Commit 3: Add another file
echo "A different file" > another_file.txt
git add another_file.txt
git commit -m "Add another_file.txt"

# Commit 4: Modify my_document.txt significantly
echo "Completely rewritten content" > my_document.txt
git add my_document.txt
git commit -m "Rewrite my_document.txt"

# Force garbage collection to create pack files
git gc --aggressive --prune=now

After git gc, your .git/objects directory will likely contain fewer loose objects and one or more .pack and .idx files in .git/objects/pack/. The .idx file is an index that helps Git quickly locate objects within the corresponding .pack file.

If you were to inspect the contents of these pack files (using git verify-pack -v .git/objects/pack/pack-*.pack), you’d see that many objects are listed as "delta" and point to a "base" object, along with an offset and size for the delta data.

The most surprising thing about Git’s pack files is that they don’t just delta against previous versions of the same file. Git’s delta algorithm is sophisticated enough to find similar content across any two objects in the repository, even if they belong to different files or different commits entirely. This means if you have two large binary files that are 99% identical, Git can store one fully and the other as a delta against the first, saving immense space.

This delta-based representation is key to Git’s ability to handle massive codebases and long histories efficiently, both for storage and for network transfers during clone and fetch operations.

The next logical step after understanding how objects are packed is to explore how Git uses this packed history during operations like git fetch and git clone.

Want structured learning?

Take the full Git course →