Neo4j’s Causal Cluster doesn’t actually achieve high availability by replicating data across all nodes simultaneously; instead, it prioritizes consistency and availability by designating one node as the "leader" for writes, while others mirror its state.
Let’s see this in action. Imagine we have a Causal Cluster with three core servers: core1, core2, and core3.
# On core1 (which will initially be the leader)
docker run \
--publish=7474:7474 --publish=7687:7687 \
-v neo4j_data_core1:/data \
-e NEO4J_ACCEPT_LICENSE_AGREEMENT=yes \
-e NEO4J_AUTH=neo4j/password \
-e NEO4J_CLUSTER_LISTEN_ADDRESS=core1:7687 \
-e NEO4J_CLUSTER_DISCOVERY_TOKEN=mytoken \
-e NEO4J_CLUSTER_ADVERTISE_ADDRESS=core1:7687 \
-e NEO4J_CLUSTER_ROLE=CORE \
-e NEO4J_CLUSTER_ROUTING_MANAGE_ADDRESS=core1:60000 \
-e NEO4J_CLUSTER_ROUTING_LISTEN_ADDRESS=core1:60001 \
neo4j:latest
# On core2
docker run \
--publish=7474:7474 --publish=7687:7687 \
-v neo4j_data_core2:/data \
-e NEO4J_ACCEPT_LICENSE_AGREEMENT=yes \
-e NEO4J_AUTH=neo4j/password \
-e NEO4J_CLUSTER_LISTEN_ADDRESS=core2:7687 \
-e NEO4J_CLUSTER_DISCOVERY_TOKEN=mytoken \
-e NEO4J_CLUSTER_ADVERTISE_ADDRESS=core2:7687 \
-e NEO4J_CLUSTER_ROLE=CORE \
-e NEO4J_CLUSTER_ROUTING_MANAGE_ADDRESS=core2:60000 \
-e NEO4J_CLUSTER_ROUTING_LISTEN_ADDRESS=core2:60001 \
-e NEO4J_CLUSTER_DISCOVERY_Seed_Provider=manual \
-e NEO4J_CLUSTER_DISCOVERY_Seed_Addresses=core1:7687 \
neo4j:latest
# On core3
docker run \
--publish=7474:7474 --publish=7687:7687 \
-v neo4j_data_core3:/data \
-e NEO4J_ACCEPT_LICENSE_AGREEMENT=yes \
-e NEO4J_AUTH=neo4j/password \
-e NEO4J_CLUSTER_LISTEN_ADDRESS=core3:7687 \
-e NEO4J_CLUSTER_DISCOVERY_TOKEN=mytoken \
-e NEO4J_CLUSTER_ADVERTISE_ADDRESS=core3:7687 \
-e NEO4J_CLUSTER_ROLE=CORE \
-e NEO4J_CLUSTER_ROUTING_MANAGE_ADDRESS=core3:60000 \
-e NEO4J_CLUSTER_ROUTING_LISTEN_ADDRESS=core3:60001 \
-e NEO4J_CLUSTER_DISCOVERY_Seed_Provider=manual \
-e NEO4J_CLUSTER_DISCOVERY_Seed_Addresses=core1:7687,core2:7687 \
neo4j:latest
In this setup, core1 will initially become the leader because it’s the first one to establish itself and meet the quorum. All write operations will be directed to core1. The other nodes (core2, core3) will receive these writes as transactions and apply them to their own data stores, maintaining a consistent state. If core1 fails, the remaining nodes will elect a new leader from among themselves. Read operations can be served by any core server, distributing the read load.
The core problem this solves is ensuring that your graph database remains accessible and your data remains consistent even if one or more servers go down. It’s built on Raft consensus for managing state and leader election. The cluster consists of "core" servers, which are responsible for data and writes, and optional "read-replica" servers that can only serve reads. The key to its operation is the concept of a "leader" for writes. All write transactions are first sent to the leader, which then replicates the changes to the other core servers. A quorum (majority) of core servers must acknowledge receipt of a write for it to be committed. This ensures that even if the leader fails, the committed data is not lost because at least a majority of nodes have it.
The NEO4J_CLUSTER_DISCOVERY_TOKEN is crucial for nodes to identify each other as part of the same cluster. Without it, nodes won’t trust each other. The NEO4J_CLUSTER_ADVERTISE_ADDRESS tells other nodes how to reach this specific server for inter-node communication, which is vital for replication and leader election. The NEO4J_CLUSTER_ROUTING_MANAGE_ADDRESS and NEO4J_CLUSTER_ROUTING_LISTEN_ADDRESS are used by the routing service, which sits on each core server and directs client requests to the appropriate server (leader for writes, any core for reads).
Here’s how you’d typically interact with it:
- Writes: Connect to any core server’s Bolt port (e.g.,
bolt://core1:7687,bolt://core2:7687,bolt://core3:7687). The routing service on that server will detect the current leader and forward your write transaction there. - Reads: Connect to any core server or read-replica. The routing service will direct you to a server capable of serving reads.
The NEO4J_CLUSTER_DISCOVERY_Seed_Provider and NEO4J_CLUSTER_DISCOVERY_Seed_Addresses are how new nodes find existing members of the cluster. When a new node starts, it contacts the seed addresses to discover other nodes and join the cluster. Once a node is part of the cluster, it learns about other members through peer-to-peer communication.
A subtle but critical aspect is how reads are handled. While writes must go to the leader, reads can be served by any core server or read replica. However, there’s a concept of "read consistency." By default, reads are causally consistent, meaning if you perform a write and then immediately perform a read, the read is guaranteed to see that write. This is achieved by the routing layer ensuring the read request is sent to a core server that has already processed the relevant transaction. If you explicitly ask for strong consistency for a read, it will also be routed to the leader, which can introduce latency.
The next thing you’ll likely encounter is understanding how to configure read-replica servers to scale read throughput beyond what the core servers can handle alone.