Neo4j’s monitoring is less about watching a dashboard and more about understanding the story your data is telling about its own performance.

Let’s watch a query run and see what happens.

# Start a Neo4j instance (using Docker for simplicity)
docker run --rm -p 7474:7474 -p 7687:7687 -e NEO4J_AUTH=neo4j/neo4j neo4j:latest

# Install the APOC plugin (for some later examples)
# This involves downloading the JAR and placing it in the plugins directory,
# or configuring it via environment variables if using Docker.
# For a quick local setup, you might manually copy it into
# $NEO4J_HOME/plugins/ if running standalone.

# Connect via cypher-shell
cypher-shell -u neo4j -p neo4j

# Create some data
CREATE (a:Person {name: 'Alice'})-[:KNOWS]->(b:Person {name: 'Bob'})-[:KNOWS]->(c:Person {name: 'Charlie'});
CREATE (a)-[:LIKES]->(c);

# Run a query and observe the output in cypher-shell
# This is where the magic starts to happen.
MATCH (p:Person)-[:KNOWS]->(friend:Person) WHERE p.name = 'Alice' RETURN p.name, friend.name;

When you execute that last MATCH query, Neo4j doesn’t just fetch data. It goes through a process:

  1. Parsing: The query string is converted into an internal representation.
  2. Planning: The query planner analyzes the available indexes, data distribution, and the query structure to devise the most efficient execution strategy. This is crucial.
  3. Execution: The planned steps are carried out. This might involve index lookups, node scans, relationship traversals, and filtering.
  4. Result Streaming: The results are formatted and sent back to the client.

The key to monitoring is observing this process, especially the planning and execution phases, to identify bottlenecks.

The Core Monitoring Levers: Query Logs and Profiling

The most direct way to understand query performance is by looking at the logs and using Neo4j’s built-in profiling capabilities.

1. The Query Log (neo4j.log and query.log)

Neo4j writes various logs, but query.log (if enabled) is your primary source for slow queries.

  • What it tells you: Records queries that exceed a configured execution time threshold. This is your first line of defense against performance degradation.

  • How to enable/configure:

    • In neo4j.conf:
      # Set the threshold in milliseconds (e.g., 5000ms = 5 seconds)
      dbms.logs.query.threshold=5000
      # Ensure logging is enabled (usually on by default)
      dbms.logs.query.enabled=true
      
    • Restart Neo4j after changing neo4j.conf.
  • What to look for: Entries like this (simplified):

    2023-10-27 10:00:00.123+0000 INFO [o.n.k.i.QueryLog] Query(33) | MATCH (p:Person)-[:KNOWS]->(friend:Person) WHERE p.name = 'Alice' RETURN p.name, friend.name | 150 ms | 10000 | db.index.node.lookup: {Person} | 10000
    

    The key fields are:

    • Query(ID): Unique identifier for the query log entry.
    • Query String: The Cypher query itself.
    • Execution Time: How long the query took (e.g., 150 ms).
    • Query Type: READ, WRITE, SCHEMA.
    • Database Name: The database the query ran on.
    • Client Address: Where the query originated.
    • User: The user who ran the query.
  • Why it works: By default, Neo4j is quite fast. If a query consistently appears here, it’s likely doing more work than it needs to, or it’s hitting an inefficient part of the graph or execution plan. The threshold is your personal "alert" for when things get too slow for your application’s needs.

2. Query Execution Plans (:EXPLAIN and :PROFILE)

For any specific query you suspect is slow, you can ask Neo4j to show you how it plans to execute it, or how it actually executed it.

  • What it tells you: The detailed step-by-step breakdown of how Neo4j will (or did) process your query. This is invaluable for understanding why a query is slow.

  • How to use: In cypher-shell or any Neo4j client connected to your database:

    • :EXPLAIN <your query>: Shows the planned execution plan without actually running the query. Useful for quick analysis.
    • :PROFILE <your query>: Shows the actual execution plan, including the time and number of operations at each step. This is the most revealing.
    # Example using :PROFILE
    :profile MATCH (p:Person)-[:KNOWS]->(friend:Person) WHERE p.name = 'Alice' RETURN p.name, friend.name;
    
  • What to look for in :PROFILE output:

    • All alloc / All time: Total operations and time for that step.
    • Nodes / Relationships: How many nodes or relationships were examined.
    • Index Seek / Index Hit: Indicates usage of indexes. High Index Seek with low Index Hit might mean a full index scan or inefficient indexing.
    • Node by label scan / Relationship by type scan: These are often signs of trouble. They mean Neo4j had to look at all nodes or relationships of a certain label/type, which is usually inefficient for large graphs.
    • Filter: How many items were filtered out at a certain stage. A lot of filtering after a scan is inefficient.
    +-------------------------+-------------------------+-------------------------+-------------------------+-------------------------+
    | Statement               | All alloc               | All time                | Rows                    | Cache                   |
    +-------------------------+-------------------------+-------------------------+-------------------------+-------------------------+
    | PROFILE MATCH (p:Person)-[:KNOWS]->(friend:Person) WHERE p.name = 'Alice' RETURN p.name, friend.name | 10000 | 150 ms | 1 | 0 B |
    +-------------------------+-------------------------+-------------------------+-------------------------+-------------------------+
    | |                       |                         |                         |                         |
    | +--[NodeById(1)]--------+                         |                         |                         |
    | | :Person               | 1                       | 0.1 ms                  | 1                       | 0 B                     |
    | |                       |                         |                         |                         |
    | +--[Filter(1)]----------+                         |                         |                         |
    | | p.name = 'Alice'      | 1                       | 0.01 ms                 | 1                       | 0 B                     |
    | |                       |                         |                         |                         |
    | +--[RelationshipExpand(1)]-+                         |                         |                         |
    | | (:Person)-[:KNOWS]->() | 1                       | 0.05 ms                 | 1                       | 0 B                     |
    | |                       |                         |                         |                         |
    | +--[NodeByLabel(1)]-----+                         |                         |                         |
    | | :Person               | 1                       | 0.03 ms                 | 1                       | 0 B                     |
    | |                       |                         |                         |                         |
    | +--[Return(1)]----------+                         |                         |                         |
    |                         | 1                       | 0.01 ms                 | 1                       | 0 B                     |
    +-------------------------+-------------------------+-------------------------+-------------------------+-------------------------+
    

    (Note: This is a simplified example. Real outputs can be much more verbose.)

  • Why it works: :PROFILE is like a detailed autopsy of a query. It shows you exactly which steps consumed the most time and resources. If you see a "Node by label scan" on a table with millions of nodes, you know you need an index. If you see a lot of filtering after a relationship expand, you might need to rethink your query or data model.

Health Checks and Metrics

Beyond individual queries, Neo4j provides system-level metrics.

1. Neo4j Browser and neo4j-admin

The Neo4j Browser itself offers some basic performance indicators when you run queries. For more in-depth metrics, neo4j-admin is your friend.

  • What it tells you: System health, memory usage, cache hit rates, transaction counts, etc.
  • How to get metrics:
    • Neo4j Browser: Look for the summary information displayed after a query.
    • neo4j-admin metrics: This command-line tool can dump metrics to various formats.
      # Get metrics in Prometheus format
      neo4j-admin metrics --format prometheus
      
    • JMX: Neo4j exposes metrics via JMX, which can be scraped by tools like Prometheus (with JMX Exporter), Datadog, or New Relic.
  • What to look for:
    • Cache Hit Rate: High hit rates (e.g., > 90%) for the page cache indicate efficient data retrieval. Low rates mean Neo4j is constantly reading from disk, which is slow.
    • Transaction Throughput: Number of transactions per second. Spikes or drops can indicate issues.
    • Heap Usage: Monitor Java heap memory. Excessive garbage collection (GC) can slow down the database.
    • Page Cache Usage: How much memory is dedicated to caching graph data.
  • Why it works: These metrics give you a bird’s-eye view of the database’s overall health. If the cache hit rate plummets, or GC activity becomes very high, it suggests underlying performance problems that might affect all queries.

Common Performance Pitfalls and Their Fixes

  1. Missing Indexes:

    • Diagnosis: :PROFILE shows Node by label scan or Relationship by type scan on labels/types with many entities, especially when filtering or matching on properties. query.log shows slow MATCH or WHERE clauses on properties.
    • Fix: CREATE INDEX ON :Label(property);
      • Example: CREATE INDEX ON :Person(name);
    • Why it works: Indexes allow Neo4j to quickly find specific nodes or relationships based on property values, avoiding full scans.
  2. Inefficient Relationship Traversal:

    • Diagnosis: :PROFILE shows a large number of RelationshipExpand operations or a high number of relationships traversed without a clear path.
    • Fix: Ensure you’re not traversing relationships unnecessarily. Sometimes, restructuring your query to start from a more specific node (e.g., using an index lookup first) is key. Consider if your data model needs adjustment, e.g., denormalizing some properties if they are frequently accessed across relationships.
    • Why it works: Neo4j excels at traversing relationships, but doing so millions of times within a single query is costly. Minimizing the number of traversals is crucial.
  3. Large Result Sets:

    • Diagnosis: Queries are fast in :PROFILE but return thousands or millions of rows, causing client-side issues or slow network transfer. query.log might show fast execution times but a very large row count.
    • Fix: Use LIMIT and SKIP judiciously, or RETURN DISTINCT if appropriate. More importantly, rethink what data the client actually needs. Can you aggregate or filter further on the server?
    • Why it works: Transferring and processing massive result sets is a common bottleneck, even if the database itself computed them quickly.
  4. Overly Complex Cypher:

    • Diagnosis: :PROFILE shows many chained operations, deep nesting, or repeated subqueries.
    • Fix: Break down complex queries into smaller, manageable parts. Use WITH clauses to pass intermediate results and refine them. Sometimes, pre-calculating or denormalizing data can simplify queries.
    • Why it works: While Neo4j’s planner is sophisticated, extremely complex queries can still confuse it or lead to suboptimal execution. Simpler queries are easier to optimize.
  5. Insufficient Memory/Heap:

    • Diagnosis: High GC activity reported by JMX metrics, slow overall system performance, neo4j-admin metrics showing low page cache hit rates.
    • Fix: Increase Neo4j’s JVM heap size (dbms.memory.heap.initial_size, dbms.memory.heap.max_size in neo4j.conf). Ensure sufficient system RAM is available for both heap and page cache.
    • Why it works: Neo4j relies heavily on memory for caching graph data. Insufficient heap leads to excessive garbage collection and disk I/O, crippling performance.
  6. Disk I/O Bottlenecks:

    • Diagnosis: Low page cache hit rates, high disk I/O wait times reported by the OS, query.log showing slow queries even when :PROFILE doesn’t indicate obvious Cypher issues.
    • Fix: Use faster storage (SSDs). Ensure your dbms.memory.pagecache.size in neo4j.conf is adequately set (typically 50% of system RAM).
    • Why it works: Even with a good cache hit rate, some disk access is inevitable. Slow disks will directly translate to slow query performance.

The next step after mastering query performance is understanding how to manage your graph schema and use APOC for advanced operations.

Want structured learning?

Take the full Neo4j course →