Neo4j actually uses multiple indexing strategies under the hood, and understanding them is key to unlocking truly performant graph queries.
Let’s see it in action with a quick example. Imagine we have a Movie node with a title property.
CREATE (m1:Movie {title: "The Matrix", released: 1999, tagline: "Welcome to the Real World."})
CREATE (m2:Movie {title: "The Matrix Reloaded", released: 2003, tagline: "Free your mind."})
CREATE (m3:Movie {title: "The Matrix Revolutions", released: 2003, tagline: "Everything that has a beginning has an end."})
CREATE (m4:Movie {title: "V for Vendetta", released: 2005, tagline: "Beneath this mask there is more than just flesh. Beneath this mask there is an idea, Mr. Creedy, and ideas are bulletproof."})
If we want to find movies with "Matrix" in the title, a naive scan would check every single Movie node. But if we have an index, Neo4j can jump directly to the relevant nodes.
Neo4j’s index types are designed for different kinds of data and query patterns.
1. Range Indexes
These are your go-to for numerical and date properties, or any property where ordering and direct comparison (<, >, <=, >=) are important. Think released year, timestamp, or price.
-
What they are: B-tree structures that keep property values sorted.
-
When to use: When you frequently filter or sort by ranges or exact values of ordered data.
-
Example: Finding movies released after 2000.
First, create the index:
CREATE INDEX FOR (m:Movie) ON (m.released);Then, query:
MATCH (m:Movie) WHERE m.released > 2000 RETURN m.title, m.released;Neo4j uses the
releasedindex to quickly find movies within that range.
2. Text Indexes
These are for string properties where you need to find exact matches or perform case-insensitive lookups. Think name, email, or status.
-
What they are: Hash-based indexes for fast equality checks on strings.
-
When to use: Exact string matching, case-insensitive lookups.
-
Example: Finding the movie with the exact title "The Matrix".
Create the index:
CREATE INDEX FOR (m:Movie) ON (m.title);Query:
MATCH (m:Movie) WHERE m.title = "The Matrix" RETURN m.title;The index allows a direct lookup for "The Matrix".
3. Point Indexes
These are specialized for geospatial data, typically latitude and longitude coordinates. They enable efficient spatial queries like "find points within a certain distance."
-
What they are: Quadtrees or similar spatial indexing structures.
-
When to use: Location-based queries, proximity searches.
-
Example: If
Movienodes had alocationproperty (a point):Create the index:
CREATE INDEX FOR (m:Movie) ON (m.location) USING POINT_2D;Query:
MATCH (m:Movie) WHERE distance(m.location, point({latitude: 34.05, longitude: -118.24})) < 10000 // within 10km RETURN m.title, m.location;The
POINT_2Dindex optimizes thedistancecalculation.
4. Full-Text Indexes
This is where things get powerful for natural language searching. Full-text indexes are designed for finding documents or nodes based on keywords, partial matches, and even fuzzy matching within large text fields like descriptions or synopses.
-
What they are: Inverted indexes that tokenize text, normalize it (e.g., lowercase, remove punctuation), and store occurrences of each term. They support advanced search operators.
-
When to use: Searching within free-form text, like descriptions, comments, or articles, where users might not know exact phrasing.
-
Example: Finding movies that mention "mind" in their
tagline.Create the index:
CREATE FULLTEXT INDEX movieTaglines FOR ON (m:Movie) ON EACH [m.tagline];(Note:
FULLTEXTindexes are a bit different in syntax and require specifying the index configuration, often throughapoc.index.fulltextor within the Neo4j Browser’s index management.)Query using the
apoc.text.searchprocedure (assuming APOC is installed):CALL apoc.text.search( 'Movie', 'tagline', 'mind', {index: 'movieTaglines'} ) YIELD node RETURN node.title, node.tagline;This query uses the full-text index to efficiently find nodes where the
taglineproperty contains the word "mind", even if it’s part of a larger sentence, and after normalization (like lowercasing).
The real magic of full-text indexes lies in their ability to handle variations in language. They often employ techniques like stemming (reducing words to their root form, so "running" and "ran" match "run") and stop word removal (ignoring common words like "the", "a", "is"). This means a search for "free your mind" can potentially match "Free your mind."
When Neo4j needs to find nodes based on a property value, it first consults its indexes. If an index exists for that property and the query matches the index’s capabilities (e.g., an exact match on a text index, a range on a range index), Neo4j will use it to drastically reduce the number of nodes it needs to examine. This is the primary mechanism for query optimization in Neo4j.
A common pitfall is creating indexes on properties that are rarely queried or on low-cardinality properties (properties with very few distinct values). While not strictly harmful, they add overhead for writes and don’t provide significant lookup benefits. Conversely, forgetting to index frequently queried properties will lead to scans, making your queries slow.
The next challenge you’ll likely encounter is understanding how to combine index lookups with relationship traversals for maximum performance.