Neon’s high availability hinges on a clever separation of compute and storage, allowing for near-instantaneous failover by redirecting compute to a shared, replicated storage layer.
Let’s watch this in action. Imagine a Neon database mydb running on compute c1 and connected to a branch main.
-- On compute c1, connected to mydb
SELECT pg_current_wal_lsn();
-- Output: 0/1A2B3C4D
Now, let’s simulate a failure of c1. In Neon, this isn’t a catastrophic event. Instead, a new compute instance, c2, is spun up.
# (This is a conceptual representation, actual commands are via Neon API/CLI)
neonctl detach-compute c1
neonctl attach-compute --branch main --name c2
Immediately, c2 connects to the exact same shared storage. The replication mechanism ensures that c2 is aware of the latest committed transactions.
-- On compute c2, connected to mydb
SELECT pg_current_wal_lsn();
-- Output: 0/1A2B3C4D
See? The Log Sequence Number (LSN) is identical. c2 is now seamlessly serving traffic from where c1 left off. The entire process can take mere seconds.
The magic is in Neon’s architecture. Unlike traditional PostgreSQL where each replica is a full, independent copy, Neon’s compute nodes are stateless. They are essentially smart clients that read from and write to a shared, distributed log (the WAL stream) stored in S3-compatible object storage. This WAL stream is the single source of truth.
Here’s how it breaks down:
- WAL Streaming: PostgreSQL’s Write-Ahead Log (WAL) is continuously streamed from the active compute node to the object storage. This is the heart of the replication. The WAL records are durably stored as objects.
- Pageserver: A crucial component called the Pageserver is responsible for managing the persistent storage. It receives the WAL stream and reconstructs the database state. When a new compute node starts, it requests the necessary data segments (pages) from the Pageserver. The Pageserver, using the WAL stream, can reconstruct any point-in-time state.
- Compute Nodes: These are ephemeral PostgreSQL instances. They don’t store data locally. They read data pages from the Pageserver and write WAL records back to object storage. Because they all point to the same WAL stream and retrieve pages from the same Pageserver, they are inherently in sync.
When a failover occurs, the load balancer or the client application redirects connections from the failed compute node to a new, healthy compute node. This new node simply connects to the Pageserver and starts processing WAL records from the LSN where the previous node stopped.
Consider a scenario where you have a write operation.
- The client sends a
COMMITstatement to computec1. c1writes the transaction to its local WAL buffer and immediately streams these WAL records to object storage.- Once the WAL records are durably written to object storage,
c1acknowledges the commit to the client. - The Pageserver ingests these WAL records.
- If
c1fails after acknowledging the commit but before the data pages are fully updated on disk by the Pageserver, a new compute nodec2can connect.c2will request pages from the Pageserver, and the Pageserver will use the WAL records (which are already in object storage) to replay any necessary changes and serve the correct data.
The key takeaway is that the data is not replicated in the traditional sense of full database dumps or block-level replication. Instead, the transaction log is the replicated entity, and compute nodes are stateless consumers and producers of this log, retrieving data state from the Pageserver.
This architecture also enables features like instant branching and time-travel queries. Because the Pageserver holds the historical state derived from the WAL, you can spin up a new compute node pointing to a specific LSN or timestamp and get a read-only snapshot of the database at that exact moment.
What most users don’t immediately grasp is how the Pageserver reconstructs data. It’s not just a dumb object store. It’s an active component that understands PostgreSQL data pages. When a compute node requests a page, the Pageserver looks at its local cache of pages and then consults the WAL stream to apply any necessary updates from the WAL records to reconstruct the exact version of that page needed. This is why a new compute node can become functional so quickly – it doesn’t need to download gigabytes of data; it just needs the specific pages it requires, which the Pageserver can serve up rapidly.
The next fascinating aspect of Neon’s HA is how it handles read replicas, which are essentially just additional compute nodes reading from the same shared storage.