Neo4j’s monitoring is less about watching a dashboard and more about understanding the story your data is telling about its own performance.
Let’s watch a query run and see what happens.
# Start a Neo4j instance (using Docker for simplicity)
docker run --rm -p 7474:7474 -p 7687:7687 -e NEO4J_AUTH=neo4j/neo4j neo4j:latest
# Install the APOC plugin (for some later examples)
# This involves downloading the JAR and placing it in the plugins directory,
# or configuring it via environment variables if using Docker.
# For a quick local setup, you might manually copy it into
# $NEO4J_HOME/plugins/ if running standalone.
# Connect via cypher-shell
cypher-shell -u neo4j -p neo4j
# Create some data
CREATE (a:Person {name: 'Alice'})-[:KNOWS]->(b:Person {name: 'Bob'})-[:KNOWS]->(c:Person {name: 'Charlie'});
CREATE (a)-[:LIKES]->(c);
# Run a query and observe the output in cypher-shell
# This is where the magic starts to happen.
MATCH (p:Person)-[:KNOWS]->(friend:Person) WHERE p.name = 'Alice' RETURN p.name, friend.name;
When you execute that last MATCH query, Neo4j doesn’t just fetch data. It goes through a process:
- Parsing: The query string is converted into an internal representation.
- Planning: The query planner analyzes the available indexes, data distribution, and the query structure to devise the most efficient execution strategy. This is crucial.
- Execution: The planned steps are carried out. This might involve index lookups, node scans, relationship traversals, and filtering.
- Result Streaming: The results are formatted and sent back to the client.
The key to monitoring is observing this process, especially the planning and execution phases, to identify bottlenecks.
The Core Monitoring Levers: Query Logs and Profiling
The most direct way to understand query performance is by looking at the logs and using Neo4j’s built-in profiling capabilities.
1. The Query Log (neo4j.log and query.log)
Neo4j writes various logs, but query.log (if enabled) is your primary source for slow queries.
-
What it tells you: Records queries that exceed a configured execution time threshold. This is your first line of defense against performance degradation.
-
How to enable/configure:
- In
neo4j.conf:# Set the threshold in milliseconds (e.g., 5000ms = 5 seconds) dbms.logs.query.threshold=5000 # Ensure logging is enabled (usually on by default) dbms.logs.query.enabled=true - Restart Neo4j after changing
neo4j.conf.
- In
-
What to look for: Entries like this (simplified):
2023-10-27 10:00:00.123+0000 INFO [o.n.k.i.QueryLog] Query(33) | MATCH (p:Person)-[:KNOWS]->(friend:Person) WHERE p.name = 'Alice' RETURN p.name, friend.name | 150 ms | 10000 | db.index.node.lookup: {Person} | 10000The key fields are:
Query(ID): Unique identifier for the query log entry.Query String: The Cypher query itself.Execution Time: How long the query took (e.g.,150 ms).Query Type:READ,WRITE,SCHEMA.Database Name: The database the query ran on.Client Address: Where the query originated.User: The user who ran the query.
-
Why it works: By default, Neo4j is quite fast. If a query consistently appears here, it’s likely doing more work than it needs to, or it’s hitting an inefficient part of the graph or execution plan. The threshold is your personal "alert" for when things get too slow for your application’s needs.
2. Query Execution Plans (:EXPLAIN and :PROFILE)
For any specific query you suspect is slow, you can ask Neo4j to show you how it plans to execute it, or how it actually executed it.
-
What it tells you: The detailed step-by-step breakdown of how Neo4j will (or did) process your query. This is invaluable for understanding why a query is slow.
-
How to use: In
cypher-shellor any Neo4j client connected to your database::EXPLAIN <your query>: Shows the planned execution plan without actually running the query. Useful for quick analysis.:PROFILE <your query>: Shows the actual execution plan, including the time and number of operations at each step. This is the most revealing.
# Example using :PROFILE :profile MATCH (p:Person)-[:KNOWS]->(friend:Person) WHERE p.name = 'Alice' RETURN p.name, friend.name; -
What to look for in
:PROFILEoutput:All alloc/All time: Total operations and time for that step.Nodes/Relationships: How many nodes or relationships were examined.Index Seek/Index Hit: Indicates usage of indexes. HighIndex Seekwith lowIndex Hitmight mean a full index scan or inefficient indexing.Node by label scan/Relationship by type scan: These are often signs of trouble. They mean Neo4j had to look at all nodes or relationships of a certain label/type, which is usually inefficient for large graphs.Filter: How many items were filtered out at a certain stage. A lot of filtering after a scan is inefficient.
+-------------------------+-------------------------+-------------------------+-------------------------+-------------------------+ | Statement | All alloc | All time | Rows | Cache | +-------------------------+-------------------------+-------------------------+-------------------------+-------------------------+ | PROFILE MATCH (p:Person)-[:KNOWS]->(friend:Person) WHERE p.name = 'Alice' RETURN p.name, friend.name | 10000 | 150 ms | 1 | 0 B | +-------------------------+-------------------------+-------------------------+-------------------------+-------------------------+ | | | | | | | +--[NodeById(1)]--------+ | | | | | :Person | 1 | 0.1 ms | 1 | 0 B | | | | | | | | +--[Filter(1)]----------+ | | | | | p.name = 'Alice' | 1 | 0.01 ms | 1 | 0 B | | | | | | | | +--[RelationshipExpand(1)]-+ | | | | | (:Person)-[:KNOWS]->() | 1 | 0.05 ms | 1 | 0 B | | | | | | | | +--[NodeByLabel(1)]-----+ | | | | | :Person | 1 | 0.03 ms | 1 | 0 B | | | | | | | | +--[Return(1)]----------+ | | | | | 1 | 0.01 ms | 1 | 0 B | +-------------------------+-------------------------+-------------------------+-------------------------+-------------------------+(Note: This is a simplified example. Real outputs can be much more verbose.)
-
Why it works:
:PROFILEis like a detailed autopsy of a query. It shows you exactly which steps consumed the most time and resources. If you see a "Node by label scan" on a table with millions of nodes, you know you need an index. If you see a lot of filtering after a relationship expand, you might need to rethink your query or data model.
Health Checks and Metrics
Beyond individual queries, Neo4j provides system-level metrics.
1. Neo4j Browser and neo4j-admin
The Neo4j Browser itself offers some basic performance indicators when you run queries. For more in-depth metrics, neo4j-admin is your friend.
- What it tells you: System health, memory usage, cache hit rates, transaction counts, etc.
- How to get metrics:
- Neo4j Browser: Look for the summary information displayed after a query.
neo4j-admin metrics: This command-line tool can dump metrics to various formats.# Get metrics in Prometheus format neo4j-admin metrics --format prometheus- JMX: Neo4j exposes metrics via JMX, which can be scraped by tools like Prometheus (with JMX Exporter), Datadog, or New Relic.
- What to look for:
- Cache Hit Rate: High hit rates (e.g., > 90%) for the page cache indicate efficient data retrieval. Low rates mean Neo4j is constantly reading from disk, which is slow.
- Transaction Throughput: Number of transactions per second. Spikes or drops can indicate issues.
- Heap Usage: Monitor Java heap memory. Excessive garbage collection (GC) can slow down the database.
- Page Cache Usage: How much memory is dedicated to caching graph data.
- Why it works: These metrics give you a bird’s-eye view of the database’s overall health. If the cache hit rate plummets, or GC activity becomes very high, it suggests underlying performance problems that might affect all queries.
Common Performance Pitfalls and Their Fixes
-
Missing Indexes:
- Diagnosis:
:PROFILEshowsNode by label scanorRelationship by type scanon labels/types with many entities, especially when filtering or matching on properties.query.logshows slowMATCHorWHEREclauses on properties. - Fix:
CREATE INDEX ON :Label(property);- Example:
CREATE INDEX ON :Person(name);
- Example:
- Why it works: Indexes allow Neo4j to quickly find specific nodes or relationships based on property values, avoiding full scans.
- Diagnosis:
-
Inefficient Relationship Traversal:
- Diagnosis:
:PROFILEshows a large number ofRelationshipExpandoperations or a high number of relationships traversed without a clear path. - Fix: Ensure you’re not traversing relationships unnecessarily. Sometimes, restructuring your query to start from a more specific node (e.g., using an index lookup first) is key. Consider if your data model needs adjustment, e.g., denormalizing some properties if they are frequently accessed across relationships.
- Why it works: Neo4j excels at traversing relationships, but doing so millions of times within a single query is costly. Minimizing the number of traversals is crucial.
- Diagnosis:
-
Large Result Sets:
- Diagnosis: Queries are fast in
:PROFILEbut return thousands or millions of rows, causing client-side issues or slow network transfer.query.logmight show fast execution times but a very large row count. - Fix: Use
LIMITandSKIPjudiciously, orRETURN DISTINCTif appropriate. More importantly, rethink what data the client actually needs. Can you aggregate or filter further on the server? - Why it works: Transferring and processing massive result sets is a common bottleneck, even if the database itself computed them quickly.
- Diagnosis: Queries are fast in
-
Overly Complex Cypher:
- Diagnosis:
:PROFILEshows many chained operations, deep nesting, or repeated subqueries. - Fix: Break down complex queries into smaller, manageable parts. Use
WITHclauses to pass intermediate results and refine them. Sometimes, pre-calculating or denormalizing data can simplify queries. - Why it works: While Neo4j’s planner is sophisticated, extremely complex queries can still confuse it or lead to suboptimal execution. Simpler queries are easier to optimize.
- Diagnosis:
-
Insufficient Memory/Heap:
- Diagnosis: High GC activity reported by JMX metrics, slow overall system performance,
neo4j-admin metricsshowing low page cache hit rates. - Fix: Increase Neo4j’s JVM heap size (
dbms.memory.heap.initial_size,dbms.memory.heap.max_sizeinneo4j.conf). Ensure sufficient system RAM is available for both heap and page cache. - Why it works: Neo4j relies heavily on memory for caching graph data. Insufficient heap leads to excessive garbage collection and disk I/O, crippling performance.
- Diagnosis: High GC activity reported by JMX metrics, slow overall system performance,
-
Disk I/O Bottlenecks:
- Diagnosis: Low page cache hit rates, high disk I/O wait times reported by the OS,
query.logshowing slow queries even when:PROFILEdoesn’t indicate obvious Cypher issues. - Fix: Use faster storage (SSDs). Ensure your
dbms.memory.pagecache.sizeinneo4j.confis adequately set (typically 50% of system RAM). - Why it works: Even with a good cache hit rate, some disk access is inevitable. Slow disks will directly translate to slow query performance.
- Diagnosis: Low page cache hit rates, high disk I/O wait times reported by the OS,
The next step after mastering query performance is understanding how to manage your graph schema and use APOC for advanced operations.