git archive is the secret weapon for shipping Git repositories without the .git directory.
Let’s see it in action. Imagine you have a project, and you want to send a clean snapshot of your main branch to a colleague, or perhaps to a build server for deployment.
git archive --format=tar --output=myproject-v1.0.tar main
This command creates a tarball named myproject-v1.0.tar containing the exact files and directories as they exist in the main branch at that moment. No .git folder, no commit history, just the working files.
The magic here is that git archive doesn’t just copy files; it retrieves them directly from Git’s internal object database. This means it’s incredibly efficient and guarantees you’re getting exactly what’s in a specific commit, tag, or branch.
Consider a scenario where you need to deploy a specific version of your application. You’ve tagged v1.2.3.
git archive --format=zip --output=app-v1.2.3.zip v1.2.3
This generates a zip file, app-v1.2.3.zip, containing the project’s state at the v1.2.3 tag. This is perfect for deployment because you’re not carrying around the entire history, just the code itself.
You can also archive a specific commit hash:
git log --oneline -5
# Output:
# a1b2c3d (HEAD -> main) Add feature X
# e4f5g6h Fix bug Y
# i7j8k9l Initial commit
# ...
git archive --format=tar.gz --output=snapshot-a1b2c3d.tar.gz a1b2c3d
This creates a compressed tarball (.tar.gz) of the repository’s state at commit a1b2c3d.
The core problem git archive solves is the need for a "clean" copy of your project’s files. When you clone a repository, you get the .git directory, which contains all the history, branches, tags, and configuration. While essential for development, it’s often unnecessary (and can be large) for distribution, building, or deployment. git archive strips this away, leaving only the project’s actual content.
Internally, git archive traverses the tree object associated with the specified commit, branch, or tag. It then reads the blob objects for each file and streams their content into the chosen archive format (tar, zip, etc.). It doesn’t need to check out files into your working directory, making it fast and suitable for scripting.
You can even specify a subset of files or directories to archive, although this is less common for the primary use case of shipping the whole project.
git archive --format=tar --output=src.tar main -- src/
This would only archive the src/ directory from the main branch.
The output format is a crucial parameter. Common choices include tar (uncompressed tarball), tar.gz or tgz (gzipped tarball), and zip. The --output flag specifies the filename for the archive. If omitted, git archive will write to standard output, which is useful for piping the archive directly to another command.
git archive --format=tar main | ssh user@server "tar -x -C /deploy/path"
This example pipes a tar archive of the main branch directly to an SSH command, which then extracts it into /deploy/path on a remote server.
One subtle but powerful aspect is how git archive handles submodules. By default, it archives the commit hash of the submodule, not its contents. If you need the submodule content, you’d typically need to initialize and update them separately or use a different process. However, for many deployment scenarios, archiving the submodule commit hash is exactly what you want, as it ensures consistency with the main project’s state.
The next step after generating these clean archives is often integrating them into CI/CD pipelines or setting up automated build processes that consume these exact, versioned snapshots.