MongoDB partial indexes let you index only a subset of documents in a collection, dramatically improving performance and reducing storage for queries that target specific fields.
Let’s see this in action. Imagine a large orders collection where most documents have a status field set to completed. However, we frequently query for orders that are still pending.
// Sample documents in the 'orders' collection
{ "_id": ObjectId("..."), "order_id": "ORD1001", "status": "completed", "amount": 50.00, "timestamp": ISODate("2023-10-26T10:00:00Z") }
{ "_id": ObjectId("..."), "order_id": "ORD1002", "status": "pending", "amount": 75.50, "timestamp": ISODate("2023-10-26T10:05:00Z") }
{ "_id": ObjectId("..."), "order_id": "ORD1003", "status": "completed", "amount": 120.00, "timestamp": ISODate("2023-10-26T10:10:00Z") }
{ "_id": ObjectId("..."), "order_id": "ORD1004", "status": "pending", "amount": 25.00, "timestamp": ISODate("2023-10-26T10:15:00Z") }
// ... millions more documents
A standard index on status would include all documents, including the vast majority that are completed and never queried for. This wastes space and CPU.
With a partial index, we can tell MongoDB to only build the index for documents where status is pending:
db.orders.createIndex(
{ "status": 1, "timestamp": -1 },
{ partialFilterExpression: { "status": "pending" } }
)
Now, when we run a query like db.orders.find({ status: "pending" }).sort({ timestamp: -1 }), MongoDB will use this highly efficient partial index. The index is smaller because it only contains entries for pending orders, and queries are faster because MongoDB scans fewer index entries.
The core problem partial indexes solve is the overhead of indexing documents that are rarely or never part of your query patterns. Traditional indexing builds an index entry for every document that matches the indexed fields, regardless of whether those documents are actually relevant to your common queries. This leads to:
- Increased Storage: Larger indexes consume more disk space.
- Reduced Write Performance: Every write operation to a document might require updating multiple indexes, slowing down inserts, updates, and deletes.
- Slower Query Performance: While indexes speed up reads, an unnecessarily large index means more data to scan, potentially negating some of the benefits.
Partial indexes address this by allowing you to define a filter expression. Only documents that satisfy this filter are included in the index. This is powerful because you can align your indexes precisely with your application’s query workload.
For example, if you have a collection of user profiles and you frequently query for active users (isActive: true), but rarely query for inactive ones, you could create a partial index:
db.users.createIndex(
{ "username": 1 },
{ partialFilterExpression: { "isActive": true } }
)
This index would only contain entries for users where isActive is true. Queries targeting username for active users would be lightning fast, and the index size would be significantly smaller.
The partialFilterExpression uses the same query language as find() operations. You can specify simple equality matches, or more complex conditions involving comparison operators ($gt, $lt, $ne), logical operators ($and, $or), and even array operators.
Consider a scenario with a logs collection where you want to index error messages for quick retrieval:
db.logs.createIndex(
{ "timestamp": 1, "level": 1, "message": 1 },
{ partialFilterExpression: { "level": "ERROR" } }
)
This index will only contain entries for log documents where the level field is exactly "ERROR". If you often search for level: "INFO" or level: "DEBUG", you would need separate indexes for those specific query patterns, or a broader index if those queries are less frequent but still significant.
You can also use partial indexes to enforce uniqueness on a subset of documents. For instance, to ensure that each email is unique only for active users:
db.users.createIndex(
{ "email": 1 },
{ partialFilterExpression: { "isActive": true }, unique: true }
)
This will prevent duplicate email entries for documents where isActive is true. If a document has isActive: false, its email can be duplicated without violating this unique constraint.
The key to effectively using partial indexes is understanding your query patterns. Analyze your application’s most frequent and performance-critical queries. Identify the fields and conditions that most commonly appear in your find() and sort() operations. Then, craft your partial indexes to cover those specific subsets of data.
When you create a partial index, MongoDB does not retroactively index existing documents that don’t match the filter. The index is built only for documents inserted or updated after the index creation if they meet the partialFilterExpression. If you want to apply a partial index to existing data, you’ll need to update those documents to match the filter or rebuild the index on a collection that has already been filtered.
One common pitfall is creating a partial index that is too restrictive, meaning it doesn’t cover enough of the documents relevant to your queries. If you find that a query is not using the partial index as expected, double-check that the documents being queried actually match the partialFilterExpression. MongoDB’s explain() output is invaluable here for verifying index usage.
After successfully implementing partial indexes, the next challenge is optimizing compound partial indexes for multi-field queries.