pg_dump and pg_restore are your go-to tools for backing up and restoring PostgreSQL databases, but they’re not just simple copy commands. They handle the intricacies of database structure, data, and even object ownership.
Let’s see pg_dump in action. Imagine you have a PostgreSQL database named mydatabase running on localhost:5432. To create a plain-text SQL backup, you’d run:
pg_dump -h localhost -p 5432 -U myuser -Fp mydatabase > mydatabase_backup.sql
This command connects to mydatabase as myuser, dumps its contents into a file named mydatabase_backup.sql in a plain SQL format. The -Fp flag explicitly specifies plain text format.
Now, to restore this backup into a new, empty database called newdb on the same server:
psql -h localhost -p 5432 -U myuser -d newdb < mydatabase_backup.sql
This uses psql, the PostgreSQL interactive terminal, to execute the SQL commands contained within mydatabase_backup.sql against the newdb database.
However, for larger databases, plain text can be inefficient. pg_dump offers custom (-Fc) and directory (-Fd) formats, which are compressed and allow for parallel restores.
Here’s a custom format backup:
pg_dump -h localhost -p 5432 -U myuser -Fc mydatabase > mydatabase_backup.dump
And here’s how you’d restore it with pg_restore:
pg_restore -h localhost -p 5432 -U myuser -d newdb mydatabase_backup.dump
Notice that pg_restore is used instead of psql for compressed formats. It’s designed to understand the archive formats pg_dump creates.
The real power of pg_restore shines with parallel restores. If your backup was created in directory format (-Fd):
pg_dump -h localhost -p 5432 -U myuser -Fd mydatabase -j 4 -f mydatabase_backup_dir
The -j 4 flag tells pg_dump to use 4 parallel jobs to create the backup. This speeds up the backup process significantly. The output will be a directory named mydatabase_backup_dir containing multiple files.
To restore this parallel backup:
pg_restore -h localhost -p 5432 -U myuser -d newdb -j 4 mydatabase_backup_dir
Again, -j 4 is used, this time to leverage 4 parallel jobs for the restore operation, drastically cutting down the time it takes to bring the database back online.
What most people don’t realize is that pg_dump generates a set of SQL commands that, when executed in order, recreate the database. This includes CREATE TABLE statements, COPY commands for data, and ALTER TABLE statements for constraints and indexes. pg_restore essentially takes this structured output and applies it. For custom and directory formats, it’s more than just SQL; it’s a serialized archive that pg_restore intelligently unpacks. The --jobs (-j) option in pg_restore is particularly effective because it can restore independent objects (like tables that don’t have foreign key dependencies on each other) concurrently, making full use of multi-core processors.
When you need to selectively restore parts of a backup, pg_dump’s -t (table), -n (schema), and -T (exclude table) options are invaluable, and pg_restore can then be used to apply these filtered backups.