Neo4j’s page cache doesn’t just store nodes and relationships; it’s the entire working set of your graph, and if it’s too small, your disk becomes the bottleneck, grinding queries to a crawl.
Let’s see it in action. Imagine you have a modest Neo4j instance, and you’re running a common "find friends of friends" query:
MATCH (p:Person {name: 'Alice'})-[:FRIENDS_WITH]->()-[:FRIENDS_WITH]->(friendOfFriend)
RETURN DISTINCT friendOfFriend.name
If your page cache is undersized, Neo4j will repeatedly have to fetch data blocks from disk. You’ll see high disk I/O, low cache hit ratios, and slow query times.
Now, let’s scale that cache up. We’ll modify the neo4j.conf file, specifically the dbms.memory.pagecache.size setting.
Here’s how Neo4j uses the page cache. When a query needs data, Neo4j first checks the page cache. If the data is there (a cache hit), it’s returned lightning fast. If not (a cache miss), Neo4j reads the data from disk into the page cache, then returns it. Subsequent requests for that same data will then be cache hits. The goal is to keep the "working set" – the data actively used by your queries – in the page cache.
The primary lever you have is dbms.memory.pagecache.size. This is a dynamic setting that can be adjusted while Neo4j is running, but a restart is needed for the change to take effect.
Diagnosis:
Before tuning, you need to understand your current situation.
-
Check Cache Hit Ratio: The most telling metric.
- Command:
CALL dbms.monitor.get_page_cache_stats() - What to look for:
page_cache_hit_ratio. A ratio consistently below 0.9 (90%) indicates that Neo4j is frequently missing data in the cache and going to disk.
- Command:
-
Monitor System Resources:
- Command (Linux):
iostat -x 1 - What to look for: High
%utilandawaitfor your disk device. This signals disk contention, a direct consequence of insufficient page cache.
- Command (Linux):
-
Neo4j Logs:
- Location:
$NEO4J_HOME/logs/neo4j.log - What to look for: Messages indicating slow disk operations or frequent page faults.
- Location:
Tuning Steps:
-
Estimate Your Working Set: This is the crucial, often tricky part.
- Method: A good starting point is to look at the total size of your graph data files in
$NEO4J_HOME/data/databases/<your_db_name>/. This is a gross overestimation, but it gives you an upper bound. A more refined estimate comes from observingdbms.monitor.get_page_cache_stats()over time. Look at thepage_cache_size_in_bytesand compare it to the total data size. If your cache is significantly smaller than your total data but still has a good hit ratio, you’re likely in a good spot. If it’s small and the hit ratio is bad, you need more.
- Method: A good starting point is to look at the total size of your graph data files in
-
Set
dbms.memory.pagecache.size:- File:
$NEO4J_HOME/conf/neo4j.conf - Setting:
dbms.memory.pagecache.size=8G(Example: for 8 Gigabytes) - Guideline: Allocate a significant portion of your available RAM to the page cache. A common recommendation is to leave 2-4GB for the OS and other Neo4j processes, and dedicate the rest to the page cache. For a server with 32GB RAM,
dbms.memory.pagecache.size=28Gis a reasonable starting point. - Why it works: By increasing the size, you increase the probability that frequently accessed data blocks reside in RAM, dramatically reducing disk I/O and speeding up query execution.
- File:
-
Restart Neo4j:
- Command (systemd):
sudo systemctl restart neo4j - Command (init.d):
sudo service neo4j restart - Why it works: Configuration changes in
neo4j.confrequire a service restart to be loaded.
- Command (systemd):
-
Re-evaluate Cache Hit Ratio:
- Command:
CALL dbms.monitor.get_page_cache_stats() - What to look for: A
page_cache_hit_ratiothat has significantly improved, ideally above 0.95.
- Command:
-
Monitor System Resources Again:
- Command (Linux):
iostat -x 1 - What to look for: A noticeable decrease in disk I/O (
%util,await).
- Command (Linux):
Common Pitfalls and Advanced Considerations:
- Over-allocation: Don’t allocate more RAM to the page cache than your system has available. This will lead to the OS swapping, which is even worse than disk I/O.
- Graph Size vs. Working Set: If your entire graph fits comfortably within RAM (i.e., your working set is your entire graph), then
dbms.memory.pagecache.sizeshould be set to nearly all available RAM, minus OS and other process needs. - Mixed Workloads: If you have very different types of queries (e.g., heavy writes vs. heavy reads), tuning might involve more than just the page cache. Transaction logs, heap size, and thread pools become more important.
- SSD vs. HDD: The impact of page cache tuning is far more dramatic on HDDs than on SSDs, though SSDs still benefit immensely.
When you’ve correctly tuned your page cache, you’ll observe that dbms.monitor.get_page_cache_stats() shows a page_cache_hit_ratio consistently above 0.95, and your disk I/O metrics (iostat) will show significantly reduced utilization and wait times.
The next bottleneck you’ll likely encounter after a well-tuned page cache is CPU, especially for complex graph traversals.