Neo4j composite indexes can actually make your multi-property queries slower if you don’t understand how they work.
Let’s say you have a User node with properties email and username. You want to find a user by both their email and username. A naive approach might be to create a composite index on (User {email: '...', username: '...'}).
// This query might not use the composite index efficiently
MATCH (u:User {email: 'test@example.com', username: 'tester123'})
RETURN u
Here’s how Neo4j actually handles this: when you create a composite index, Neo4j stores the indexed properties as a single, combined key. Think of it like a compound primary key in a relational database. The order of properties in the index definition matters significantly. An index on (User {email, username}) is fundamentally different from (User {username, email}).
Consider this scenario: you have a large Product dataset and you frequently query for products by category and price.
CREATE (p1:Product {name: 'Laptop', category: 'Electronics', price: 1200.00})
CREATE (p2:Product {name: 'Keyboard', category: 'Electronics', price: 75.00})
CREATE (p3:Product {name: 'Desk', category: 'Furniture', price: 300.00})
If you create a composite index like this:
CREATE INDEX ON :Product(category, price)
Neo4j will build an index where the primary key is the category and the secondary key is the price.
Now, let’s look at how queries are executed with this index.
Query 1: Using the full composite index
MATCH (p:Product {category: 'Electronics', price: 75.00})
RETURN p
In this case, Neo4j can use the entire composite index. It first looks for 'Electronics' in the index, and within that subset, it looks for 75.00. This is highly efficient because Neo4j can directly pinpoint the records. The query plan will show NodeIndexSeek or NodeUniqueIndexSeek using both category and price.
Query 2: Using only the leading property of the composite index
MATCH (p:Product {category: 'Electronics'})
RETURN p
This query can use the composite index, but only for the leading property (category). Neo4j will find all products with the category 'Electronics' using the index. However, it won’t filter by price using the index itself. It will then have to scan the remaining properties of the found nodes to see if they match any other criteria (though in this specific query, there are none). The query plan will show NodeIndexSeek using only category.
Query 3: Trying to use only the trailing property (inefficient)
MATCH (p:Product {price: 75.00})
RETURN p
This query cannot effectively use the composite index (category, price). Why? Because the index is structured around category first. Neo4j doesn’t have a direct way to jump to all entries with price: 75.00 without first knowing the category. To satisfy this query, Neo4j will likely fall back to a full label scan (NodeByLabelScan) or a generic index lookup if one exists for price alone, ignoring the composite index.
The "Surprising Truth": Order is Everything, and Partial Use is a Trap
The most counterintuitive aspect of composite indexes is that they are ordered. An index on (A, B) is not the same as (B, A). Neo4j can only efficiently use a composite index if your query filters on the leading properties of that index, in order. If you query for B without A, or for A and C when the index is (A, B, C), you’re likely not getting the intended performance boost. Many developers assume a composite index will magically speed up any query involving its constituent properties, but this isn’t the case. It’s like having a dictionary sorted by last name, then first name; you can quickly find "Smith, John," but finding "John Smith" without knowing the last name requires a different approach.
Let’s see this in action with a slightly more complex example. Suppose we have a composite index on :User(lastName, firstName).
// Create a composite index on lastName and firstName for User nodes
CREATE INDEX ON :User(lastName, firstName)
// Example data
CREATE (u1:User {firstName: 'Alice', lastName: 'Smith', age: 30})
CREATE (u2:User {firstName: 'Bob', lastName: 'Smith', age: 35})
CREATE (u3:User {firstName: 'Charlie', lastName: 'Jones', age: 25})
Efficient Query:
// Uses the leading property (lastName) and the next property (firstName)
MATCH (u:User {lastName: 'Smith', firstName: 'Alice'})
RETURN u
Neo4j’s query plan will likely show NodeUniqueIndexSeek(label=[User], propertyKeys=[lastName, firstName]), indicating it’s using both parts of the index to directly find the record.
Partially Efficient Query:
// Uses only the leading property (lastName)
MATCH (u:User {lastName: 'Smith'})
RETURN u
The query plan here will show NodeIndexSeek(label=[User], propertyKeys=[lastName]). Neo4j uses the index to find all 'Smith' users, but it doesn’t use the firstName part of the index to narrow it down further. It then has to process the results of the index seek.
Inefficient Query (for this index):
// Tries to use firstName without lastName - this will NOT use the composite index efficiently
MATCH (u:User {firstName: 'Alice'})
RETURN u
If there’s no other index on firstName, Neo4j will likely perform a NodeByLabelScan(label=[User]) and then filter in memory, or use a generic index if one exists for firstName alone. The composite index (lastName, firstName) offers no benefit here.
The One Thing Most People Don’t Know: Index Cardinality and Selectivity
While order is paramount, the cardinality and selectivity of the properties within the composite index also play a huge role. If the leading property (category in our Product example) has very low cardinality (e.g., only a few distinct values, like 'Electronics', 'Furniture', 'Clothing'), the index might not be very selective on its own. Neo4j will use the index to find all products in 'Electronics', but if there are millions of electronic products, the index on (category, price) will still return a large set for the first step. The second property (price) then becomes crucial for filtering. Conversely, if category was highly selective (e.g., a unique product ID), then price might be less important for the index to be effective. Understanding your data’s distribution helps you choose the right order.
The next concept you’ll likely grapple with is how to handle queries that don’t align with your composite index order, leading you to consider multiple indexes or different indexing strategies.