Neo4j’s backup and restore mechanisms are designed to be remarkably simple, but the nuances of incremental backups can catch you off guard if you’re not paying attention to the underlying transaction log.
Let’s see Neo4j in action with a simulated scenario. Imagine we have a running Neo4j instance, and we want to back it up.
First, we’ll create a simple graph.
CREATE (a:Person {name: 'Alice'})-[:KNOWS]->(b:Person {name: 'Bob'})
RETURN a, b
Now, let’s perform a full backup. This is your baseline. You’d typically do this when you first set up your database or periodically to have a complete snapshot.
/path/to/neo4j/bin/neo4j-admin dump --database=neo4j --to-path=/backups/full_backup_20231027
This command creates a directory /backups/full_backup_20231027 containing all the data files for the neo4j database at that exact moment. It’s a complete, self-contained copy.
Now, let’s add more data to simulate changes that would occur between full backups.
CREATE (c:Person {name: 'Charlie'})-[:LIKES]->(a)
RETURN c
If you were to perform another full backup now, it would be a separate, complete copy, potentially wasting storage if the changes are small. This is where incremental backups shine. Neo4j’s incremental backups leverage its transaction log. Every write operation is recorded in the transaction log. An incremental backup essentially captures the state of the transaction log since the last backup (either full or incremental).
To perform an incremental backup, you need to know the end ID of the last backup you took. Neo4j stores this information. For our example, let’s assume the full_backup_20231027 ended at transaction log ID 12345.
/path/to/neo4j/bin/neo4j-admin dump --database=neo4j --to-path=/backups/incremental_backup_20231027_part1 --incremental-from=12345
This command creates a new backup directory (/backups/incremental_backup_20231027_part1) containing only the transaction log entries from ID 12345 up to the current end of the log.
Let’s make another change:
CREATE (d:Person {name: 'David'})
RETURN d
And take another incremental backup, assuming the previous one ended at log ID 12390:
/path/to/neo4j/bin/neo4j-admin dump --database=neo4j --to-path=/backups/incremental_backup_20231027_part2 --incremental-from=12390
This creates /backups/incremental_backup_20231027_part2 with the latest transaction log entries.
The mental model here is that a full backup is a snapshot of the data files. Incremental backups are append-only logs of changes. To restore, you first restore the full backup, then apply each incremental backup in sequence.
The key levers you control are the --to-path (where backups are stored) and --incremental-from (the transaction log ID to start from). The --database flag specifies which database within your Neo4j instance you’re backing up (useful if you’re running multiple databases).
To restore, you’d first create a new, empty database or clear an existing one. Then, you’d use neo4j-admin restore.
For a full restore:
/path/to/neo4j/bin/neo4j-admin restore --database=neo4j --from-path=/backups/full_backup_20231027 --overwrite-destination
For an incremental restore, you’d restore the full backup first, then apply each incremental backup in order.
# Restore the full backup
/path/to/neo4j/bin/neo4j-admin restore --database=neo4j --from-path=/backups/full_backup_20231027 --overwrite-destination
# Apply the first incremental backup
/path/to/neo4j/bin/neo4j-admin restore --database=neo4j --from-path=/backups/incremental_backup_20231027_part1 --incremental-append
# Apply the second incremental backup
/path/to/neo4j/bin/neo4j-admin restore --database=neo4j --from-path=/backups/incremental_backup_20231027_part2 --incremental-append
The --incremental-append flag is crucial here. It tells Neo4j to apply the transaction logs in the specified path to the existing database state, rather than overwriting it.
The one thing most people don’t realize is that the --incremental-from ID isn’t just a number; it corresponds to a specific point in the transaction log file. If your transaction logs are rotated (which is default behavior for managing log size), and you don’t capture the incremental backup before the log file containing that end ID is purged, you can’t perform an incremental restore from that point. You’d have to fall back to an earlier full backup. This is why managing transaction log retention is critical for a robust incremental backup strategy.
The next problem you’ll run into is managing the lifecycle of these transaction log files and ensuring they are available when needed for incremental restores.