The tar command, a cornerstone of Linux archiving, is often misunderstood as a compression tool, when its primary function is actually to bundle files together.
Let’s see it in action. Imagine you have a directory named my_project with a few files inside:
$ ls -R my_project/
my_project/:
file1.txt file2.txt subdir/
my_project/subdir/:
another_file.txt
To create a single archive file named my_project.tar containing everything in my_project, you’d use:
tar -cvf my_project.tar my_project/
Here’s what’s happening:
-c: Create a new archive.-v: Verbose output, showing each file as it’s added.-f: Specifies the filename for the archive.
The result is my_project.tar, a single file that contains my_project/file1.txt, my_project/file2.txt, and my_project/subdir/another_file.txt in a specific format. You can then extract it:
tar -xvf my_project.tar
-x: Extract files from an archive.
Now, where does compression come in? tar itself doesn’t compress. It’s a bundler. To compress the archive, you pipe its output to a compression utility, or use tar’s built-in flags that leverage these utilities.
The most common compression methods are gzip and bzip2.
Gzip Compression:
To create a gzipped tar archive (.tar.gz or .tgz), you add the -z flag:
tar -czvf my_project.tar.gz my_project/
-z: Filter the archive through gzip.
This command first bundles my_project into a .tar file in memory, then pipes that data to gzip, and finally writes the compressed output to my_project.tar.gz.
To extract a .tar.gz file:
tar -xzvf my_project.tar.gz
Bzip2 Compression:
For potentially better compression ratios (though often slower), you can use bzip2 with the -j flag:
tar -cjvf my_project.tar.bz2 my_project/
-j: Filter the archive through bzip2.
And to extract:
tar -xjvf my_project.tar.bz2
LZMA Compression (XZ):
For even higher compression ratios, xz is an option, using the -J flag:
tar -cJvf my_project.tar.xz my_project/
-J: Filter the archive through xz.
Extraction:
tar -xJvf my_project.tar.xz
The core problem tar solves is turning a directory structure with many files and subdirectories into a single, manageable file. This is crucial for backups, transferring collections of files, or deploying software. Compression is a secondary, but often essential, step to reduce the storage space and network bandwidth required.
When you extract an archive with compression, tar intelligently detects the compression type based on the filename extension (e.g., .gz, .bz2, .xz). If you don’t specify the correct flag (-z, -j, or -J), tar will try to decompress the data as if it were uncompressed, leading to a corrupted file or an error. For example, trying to extract my_project.tar.gz with tar -xvf my_project.tar.gz will fail because tar won’t know to use gzip for decompression.
The real power comes from combining tar’s archiving with other commands via pipes. For instance, to compress a large log file before archiving it, or to archive a directory and then stream it over SSH:
# Archive a directory, compress it with gzip, and send it over SSH
tar -czf - my_project/ | ssh user@remote_host 'cat > my_project_backup.tar.gz'
Here, - as the filename tells tar to use standard output. The output of tar is piped (|) to ssh, which then executes cat > my_project_backup.tar.gz on the remote host to receive the data on its standard input and write it to a file.
The most surprising thing about tar’s compression flags is that they are often just wrappers around the actual compression utilities. While tar -z invokes gzip, you could achieve the same result by manually piping: tar -cf - my_project/ | gzip > my_project.tar.gz. The flags simplify this common workflow.
The next concept you’ll likely encounter is handling different file permissions, ownership, and symbolic links correctly during archiving and extraction, which tar also manages with specific flags.