Neo4j CDC streams are designed to let you react to individual graph changes as they happen, not just snapshots of the entire graph.

Let’s see this in action. Imagine we have a simple graph of users and their friendships.

{
  "type": "NODE",
  "payload": {
    "before": null,
    "after": {
      "id": "user:123",
      "labels": ["User"],
      "properties": {
        "name": "Alice"
      }
    }
  }
}

This is a NODE change, specifically a node creation. before is null because it didn’t exist. after shows the new User node with ID user:123 and the property name: "Alice".

Now, let’s say Alice befriends Bob.

{
  "type": "RELATIONSHIP",
  "payload": {
    "before": null,
    "after": {
      "id": "friendship:456",
      "type": "FRIENDS_WITH",
      "startNode": "user:123",
      "endNode": "user:789",
      "properties": {}
    }
  }
}

This is a RELATIONSHIP change. Again, before is null as the FRIENDS_WITH relationship didn’t exist. after details the new relationship with ID friendship:456, connecting user:123 (Alice) to user:789 (Bob).

If Alice changes her name to "Alice Wonderland":

{
  "type": "NODE",
  "payload": {
    "before": {
      "id": "user:123",
      "labels": ["User"],
      "properties": {
        "name": "Alice"
      }
    },
    "after": {
      "id": "user:123",
      "labels": ["User"],
      "properties": {
        "name": "Alice Wonderland"
      }
    }
  }
}

This is a NODE change again, but this time both before and after are populated. We see the user:123 node, with its name property updated from "Alice" to "Alice Wonderland".

The core problem CDC streams solve is enabling event-driven architectures where your downstream systems (search indexes, caches, other databases, analytics platforms) can stay eventually consistent with your Neo4j graph without constant polling or expensive full graph exports. Instead of asking "what’s changed since last time?", you receive a stream of "this specific thing changed, here’s what it was before and after."

Internally, Neo4j’s Change Data Capture (CDC) mechanism works by tapping into the database’s transaction log. Every modification to the graph – node creation, property updates, relationship deletion – is recorded in this log. CDC then reads these log entries, transforms them into a structured event format (like the JSON examples above), and makes them available via a Kafka topic or other configured destination. This means the events are ordered, immutable, and represent the precise state transitions of your graph data.

To configure CDC, you typically enable it in your neo4j.conf file. You’ll need to set db.logs.transaction.enabled=true to ensure transactions are logged. Then, you configure the CDC sink. For Kafka, this would involve setting parameters like neo4j.cdc.sink.kafka.bootstrap.servers to your Kafka broker addresses (e.g., localhost:9092) and neo4j.cdc.sink.kafka.topic to your desired topic name (e.g., neo4j.changes). You also specify the format, often neo4j.cdc.sink.format=json for human-readability. Restarting Neo4j applies these changes.

The most surprising aspect of CDC is how granular it is, and how it allows for near real-time synchronization of individual property changes. It doesn’t just tell you a node was modified; it tells you which properties on that node changed, and their old and new values. This level of detail is crucial for applications that need to react to specific attribute updates, like updating a real-time recommendation engine when a user’s preferences change, or invalidating a cache entry only for the specific data that has become stale.

The next conceptual hurdle you’ll encounter is handling the eventual consistency guarantees and designing idempotent consumers for your CDC events.

Want structured learning?

Take the full Neo4j course →