You can avoid reading entire documents by using covered queries in MongoDB, which means the database can answer your query using only the index, without ever touching the actual documents.
Let’s see this in action. Imagine you have a collection named users with documents like this:
{
"_id": ObjectId("..."),
"username": "alice",
"email": "alice@example.com",
"status": "active",
"lastLogin": ISODate("2023-10-27T10:00:00Z")
}
And you want to find the username and email of all active users. A standard query might look like this:
db.users.find(
{ "status": "active" },
{ "username": 1, "email": 1, "_id": 0 }
)
If you have an index on status, MongoDB will use it to find the matching documents. However, it still needs to fetch the actual documents from disk to extract the username and email fields.
Now, let’s create a compound index that includes all the fields we need for both the query and the projection:
db.users.createIndex({ "status": 1, "username": 1, "email": 1 })
With this index in place, the same query now behaves differently:
db.users.find(
{ "status": "active" },
{ "username": 1, "email": 1, "_id": 0 }
)
If you examine the query plan (db.users.find(...).explain()), you’ll see IXSCAN (index scan) and, crucially, FROM_IXSCAN which indicates the results are coming directly from the index. This is a covered query.
The problem this solves is performance, especially with large collections. When queries are covered, MongoDB doesn’t need to perform random disk I/O to retrieve documents. Instead, it reads the data directly from the index, which is typically stored in memory or is much more cache-friendly. This can drastically reduce query latency and CPU usage.
Internally, for a query to be covered, two conditions must be met:
- The query predicate (the
filterpart) must be fully supported by the index. This means all fields in the query filter must be prefixes of the indexed fields in the same order. For example, an index on{ a: 1, b: 1 }can cover queries on{ a: "value1" }or{ a: "value1", b: "value2" }, but not on{ b: "value2" }alone. - The projection (the
fieldspart) must only include fields that are part of the index. This includes the_idfield by default, unless explicitly excluded ("_id": 0). If you project fields not present in the index, even if they are part of the documents, the query cannot be covered.
Let’s break down the levers you control:
- Index Definition: This is paramount. The order and fields in your compound index dictate what can be covered. You want to include fields used in your filters first, followed by fields used in your projections.
- Query Filter: Ensure all fields in your filter are covered by the index prefix.
- Projection: Be mindful of what you’re projecting. Only project fields that are part of the index definition. If you need a field not in the index, the query won’t be covered. Explicitly excluding
_id("_id": 0) is common when you don’t need it and it’s not part of your index, further ensuring coverage if all other projected fields are in the index.
A common pitfall is forgetting about the _id field. If your index doesn’t include _id (which it won’t, unless you explicitly add it as the first field), and you don’t exclude it in your projection, MongoDB will try to retrieve it from the document, breaking coverage. So, if your index is { "status": 1, "username": 1 } and you query for { "status": "active" } projecting { "username": 1 }, the query is not covered because _id is implicitly projected. To make it covered, you’d need to add "_id": 0 to your projection.
Consider an index on { status: 1, username: 1, email: 1 }. A query like db.users.find({ status: "active" }, { username: 1, email: 1 }) will not be covered because _id is implicitly projected and not part of this index. However, db.users.find({ status: "active" }, { username: 1, email: 1, _id: 0 }) will be covered.
When you have a query that you expect to be covered, but explain() shows it’s not, always check your projection against the fields in the index, including the implicit _id. The next hurdle is often understanding how to leverage partial indexes for even more efficient coverage when you don’t need to index every document.