Neo4j’s PROFILE and EXPLAIN are your debuggers for query performance, but they don’t just show you what happened; they reveal why Neo4j chose a particular path, which is usually the more valuable insight.
Let’s see it in action. Imagine we have a simple graph: nodes labeled :Person connected by :FRIENDS_WITH relationships. We want to find all friends of a specific person, "Alice".
MATCH (a:Person {name: "Alice"})-[:FRIENDS_WITH]->(friend:Person)
RETURN friend.name
Running EXPLAIN on this query will give us a plan. It’s a tree of operations Neo4j will perform.
+-----------------------------+
| Neo4j Plan |
+-----------------------------+
| Call: CALL db.labels() |
| YIELD label |
| WHERE label = "Person"|
| RETURN label |
| (Person) |
| Filter: property(a, "name") = "Alice" |
| (Alice) |
| Seek: relationship(FRIENDS_WITH) |
| (Alice)-[:FRIENDS_WITH]->(friend:Person) |
| Return: friend.name |
+-----------------------------+
This plan tells us:
- Neo4j will first look for all labels in the database.
- It will filter for the "Person" label.
- Then, it will filter those "Person" nodes to find the one where the
nameproperty is "Alice". - From that "Alice" node, it will seek out outgoing
:FRIENDS_WITHrelationships. - Finally, it will return the
nameproperty of the connectedPersonnodes.
Now, let’s spice things up. What if our graph is huge, and we have millions of people? The EXPLAIN plan above might be too slow. We’d add an index on :Person(name).
// Create an index if it doesn't exist
CREATE INDEX ON :Person(name);
// Now, let's PROFILE the query
PROFILE MATCH (a:Person {name: "Alice"})-[:FRIENDS_WITH]->(friend:Person)
RETURN friend.name
The PROFILE command gives us the same plan structure, but with execution statistics.
+------------------------------------------------------------------+
| Neo4j Profile |
+------------------------------------------------------------------+
| Labels: 1,000,000,000 |
| Filter: name = "Alice" (200,000,000) |
| -> Index Seek: Node(Person), Property(name), Value("Alice") (1) |
| (Alice) |
| -> Relationship Seek: FRIENDS_WITH (Outgoing) (1) |
| -> Filter: Type(FRIENDS_WITH) |
| -> Filter: Relationship(FRIENDS_WITH) |
| (Alice)-[:FRIENDS_WITH]->(friend:Person) (1,000,000) |
| Return: friend.name (1,000,000) |
+------------------------------------------------------------------+
Notice the Index Seek in the PROFILE output. This is the key difference. Instead of scanning all Person nodes, Neo4j uses the index to jump directly to the "Alice" node. The numbers in parentheses are the counts of records processed at each step. (1) means Neo4j found exactly one "Alice" node, and (1,000,000) means there are a million friends.
The mental model here is that Neo4j is a query optimizer. When you write a Cypher query, it doesn’t just execute it line by line. It generates all possible execution plans and then picks the one it believes will be fastest, based on statistics about your data (like the number of nodes, relationships, and the presence of indexes). EXPLAIN shows you the chosen plan, and PROFILE shows you the chosen plan with the actual costs incurred during execution.
The most surprising thing is that Neo4j often doesn’t choose the most obvious plan, especially with complex queries. For instance, if you have a query that looks like MATCH (a)-[:RELATES_TO]->(b) RETURN a, b, and you have an index on b, Neo4j might still choose to do a full scan of all nodes if it estimates that the number of RELATES_TO relationships is small enough that traversing them from each node is faster than using the index to find b and then traversing backwards. You have to read the plan carefully to see which direction it chose.
The levers you control are primarily:
- Indexes: The most powerful tool. Create indexes on properties used in
WHEREclauses orMATCHpatterns. - Labels and Relationship Types: Using specific labels and types guides Neo4j’s graph traversal.
- Query Structure: Reordering clauses, using
WITHto pass intermediate results, and breaking down complex queries can influence the plan. - Database Statistics: Neo4j collects statistics about your data. Outdated statistics can lead to suboptimal plans. Running
:sys.db.stats.collect()can help.
The next step after analyzing query plans is understanding how to optimize relationships, particularly when dealing with many-to-many connections or when you need to traverse relationships in the "wrong" direction (e.g., finding who is friends with Alice, not just who Alice is friends with, when the relationship is only stored in one direction).