Neon Read Replicas: Scale Read Traffic Horizontally (2026)

Neon’s read replicas let you scale read traffic by creating independent copies of your database that can handle queries separately from the primary.

Here’s a database running with a primary and two read replicas, processing a mix of writes and reads.

# Simulate a write operation on the primary
def write_to_primary(data):
    print(f"Writing to primary: {data}")
    # In a real system, this would be a SQL INSERT/UPDATE/DELETE

# Simulate a read operation on a replica
def read_from_replica(query):
    print(f"Executing on replica: {query}")
    # In a real system, this would be a SQL SELECT
    return f"Results for '{query}' from replica"

# Example usage
write_to_primary("INSERT INTO users (name) VALUES ('Alice')")
print(read_from_replica("SELECT * FROM users WHERE name = 'Alice'"))
print(read_from_replica("SELECT COUNT(*) FROM orders"))

This setup is designed to offload read-heavy workloads. Instead of every SELECT query hitting the primary database and potentially blocking writes or slowing down reads, you direct them to the replicas. The primary handles all the writes (INSERT, UPDATE, DELETE), ensuring data consistency, while the replicas serve the read traffic.

The core mechanism is asynchronous replication. When data changes on the primary, those changes are logged. These logs are then sent to the read replicas, which apply the changes to their own copies of the data. This process isn’t instantaneous; there’s a small delay, known as replication lag, between a write on the primary and its appearance on a replica. For most read-heavy applications, this lag is acceptable.

You control which replica serves a read query. In a typical application architecture, you’d have a load balancer or a routing layer that directs SELECT statements to one of the read replicas and all other statements to the primary.

Here’s a simplified configuration for Neon, illustrating how you might set up a read replica.

{
  "database": {
    "id": "primary-db-123",
    "name": "my_app_db",
    "endpoint": "primary.neon.tech",
    "read_replicas": [
      {
        "id": "replica-abc",
        "name": "my_app_db_replica_1",
        "endpoint": "replica-1.neon.tech",
        "region": "us-east-1"
      },
      {
        "id": "replica-def",
        "name": "my_app_db_replica_2",
        "endpoint": "replica-2.neon.tech",
        "region": "us-east-1"
      }
    ]
  }
}

The "problem" read replicas solve is the inherent bottleneck of a single database instance handling both writes and reads. As your application grows and more users are reading data, the primary database can become overloaded. This leads to increased latency for both reads and writes, and potentially even downtime if the primary can’t keep up. By distributing read load across multiple replicas, you increase the total read throughput of your system.

Internally, Neon uses a log-based replication system. Changes are captured in a Write-Ahead Log (WAL) on the primary. This WAL stream is then consumed by the read replicas. Each replica independently applies these WAL records to its own data files. This approach is robust and allows for different performance characteristics between the primary and replicas.

The key levers you control are the number of read replicas you provision and how you route traffic to them. You can also monitor replication lag for each replica to ensure it’s within acceptable limits for your application. If lag becomes too high, it might indicate a bottleneck on the replica itself or a problem with the network between the primary and the replica.

What many people overlook is that read replicas are not just for scaling; they are also a critical component of disaster recovery. If your primary database instance fails, you can promote one of your read replicas to become the new primary, minimizing downtime. This promotion process involves ensuring the chosen replica has processed all available WAL records up to the point of failure.

The next concept you’ll likely encounter is managing replication lag and its impact on data consistency guarantees.