Neon’s Write-Ahead Log (WAL) architecture is built around a distributed, append-only log that allows for efficient replication and recovery.
Let’s see it in action. Imagine a PostgreSQL database running on Neon. When you perform a CREATE TABLE operation, instead of immediately writing to the database’s data files, PostgreSQL first writes a description of this change to its WAL.
{
"lsn": "0/1A2B3C4D",
"timestamp": "2023-10-27T10:30:00Z",
"type": "xlog",
"data": {
"wal_record_type": "Heap2Update",
"relation": {
"db_id": 16384,
"rel_id": 1234
},
"block_number": 5,
"offset_in_block": 10,
"new_tuple": {
"xmin": 100,
"xmax": 0,
"t_ctid": "(5,10)",
"t_data": "..."
}
}
}
This WAL record, representing the CREATE TABLE operation, is immediately sent to Neon’s storage layer. This append-only nature is key. New records are always added to the end, never modified.
The problem Neon’s WAL architecture solves is maintaining data consistency and availability across a distributed system, while still offering the familiar ACID guarantees of PostgreSQL. Traditionally, replicating WAL in a distributed setting involves complex coordination and potential for data loss during network partitions or node failures. Neon’s approach decouples the WAL generation from the data file storage, allowing for independent scaling and resilience.
Internally, Neon’s WAL is managed by a set of "pageserver" instances. These pageservers are responsible for storing and serving the WAL segments. When a compute node (where your PostgreSQL instance runs) generates WAL records, it sends them to the pageserver. The pageserver then durably stores these records and makes them available for other compute nodes to fetch for replication or recovery. This separation means that even if a compute node fails, its WAL is safely stored and can be used to bring up a new compute node with the latest state.
You control the behavior of WAL through PostgreSQL configuration parameters, but Neon abstracts much of this complexity. For instance, wal_level (set to replica or logical for replication), wal_sync_method (which dictates how WAL is written to disk), and max_wal_size are all managed by Neon to optimize for its distributed storage. The core mechanism is that each transaction is represented as a sequence of WAL records, and these records are the source of truth for any data change.
The most surprising thing about how Neon’s WAL is served is that compute nodes don’t stream WAL directly from a single primary. Instead, they fetch WAL segments from the nearest available pageserver. This allows for read-heavy workloads to be served from multiple, geographically distributed read replicas, each independently fetching WAL from its local pageserver, and for failover to be exceptionally fast because the WAL is already durably stored and readily accessible.
The next concept you’ll encounter is how Neon utilizes these WAL segments for its unique "time travel" query capabilities.