MongoDB’s query planner is surprisingly bad at math, and this pattern is how you teach it to stop failing.
Imagine you have a collection of orders with line_items, where each line_item has price and quantity. You frequently need to find orders where the total value of line_items exceeds $1000. The naive approach is to query like this:
db.orders.find({
$expr: {
$gt: [
{
$reduce: {
input: "$line_items",
initialValue: 0,
in: {
$add: [
"$$value",
{ $multiply: ["$$this.price", "$$this.quantity"] }
]
}
}
},
1000
]
}
})
This query, while correct, forces MongoDB to perform the reduce and multiply operations for every single document it scans. If you have millions of orders, this is glacial. The query planner has no index to help it, so it has to do a full collection scan and compute the total value on the fly for each document. This is where the "computed pattern" comes in.
The core idea is to pre-calculate the expensive fields during your data ingestion or update process, and store them directly in the document. This way, when you need to query on that computed value, it’s already there, ready to be indexed.
Let’s refactor our orders collection to include a total_value field.
Schema before:
{
"_id": ObjectId("..."),
"order_date": ISODate("2023-10-27T10:00:00Z"),
"customer_id": "cust_123",
"line_items": [
{ "product_id": "prod_A", "price": 50, "quantity": 2 },
{ "product_id": "prod_B", "price": 75, "quantity": 1 }
]
}
Schema after:
{
"_id": ObjectId("..."),
"order_date": ISODate("2023-10-27T10:00:00Z"),
"customer_id": "cust_123",
"line_items": [
{ "product_id": "prod_A", "price": 50, "quantity": 2 },
{ "product_id": "prod_B", "price": 75, "quantity": 1 }
],
"total_value": 175 // Calculated: (50 * 2) + (75 * 1)
}
Now, the query becomes trivial and indexable:
db.orders.find({ "total_value": { $gt: 1000 } })
To make this efficient, you must create an index on the new total_value field:
db.orders.createIndex({ "total_value": 1 })
With this index, MongoDB can now use it to quickly locate all orders where total_value is greater than 1000 without scanning the entire collection or performing any on-the-fly calculations.
How to implement the calculation:
-
During Insertion: If you’re inserting new orders, perform the calculation in your application code before sending the document to MongoDB.
const newLineItems = [ { product_id: "prod_A", price: 50, quantity: 2 }, { product_id: "prod_B", price: 75, quantity: 1 } ]; const calculatedTotal = newLineItems.reduce((sum, item) => sum + (item.price * item.quantity), 0); db.orders.insertOne({ order_date: new Date(), customer_id: "cust_456", line_items: newLineItems, total_value: calculatedTotal // Store the pre-calculated value }); -
During Updates: If you’re updating
line_itemson an existing order, you’ll need to re-calculate and update thetotal_valuefield. This is often done within a transaction or a carefully managed update operation.const orderId = ObjectId("..."); const updatedLineItems = [ { product_id: "prod_A", price: 60, quantity: 3 }, // Prices changed { product_id: "prod_C", price: 100, quantity: 1 } // New item ]; const newCalculatedTotal = updatedLineItems.reduce((sum, item) => sum + (item.price * item.quantity), 0); db.orders.updateOne( { _id: orderId }, { $set: { line_items: updatedLineItems, total_value: newCalculatedTotal // Update the pre-calculated value } } ); -
For Existing Data (Backfill): If you’re migrating existing data or enabling this pattern on a collection with existing documents, you’ll need to run a backfill script. This script iterates through all documents, calculates the
total_value, and updates the documents.db.orders.find().forEach(order => { const calculatedTotal = order.line_items.reduce((sum, item) => sum + (item.price * item.quantity), 0); db.orders.updateOne( { _id: order._id }, { $set: { total_value: calculatedTotal } } ); });Note: For large collections, this script should be run in batches to avoid overwhelming the server and to allow for potential retries.
Why this works mechanically:
Indexes in MongoDB are B-tree structures. They store sorted values of the indexed field and pointers to the documents. When you query on a field that has an index, MongoDB can perform a very fast seek operation on the B-tree to find the relevant index entries and then directly fetch the corresponding documents. Without an index on a computed field, MongoDB has to scan every document and compute the value for each one, which is an O(N) operation. With the index, it becomes O(log N) for the index lookup plus the cost of fetching the matching documents, which is orders of magnitude faster for large datasets.
The trade-off:
The cost of this pattern is increased write complexity and slightly larger document sizes. Every time the data that contributes to the computed field changes, you must also update the computed field itself. This adds overhead to your write operations. However, for read-heavy workloads where querying on these computed fields is frequent, the performance gain on reads is usually well worth the extra effort on writes.
You’ve now optimized your queries for total_value. The next logical step is to consider how to optimize queries that involve ranges across multiple computed fields, or how to handle complex aggregations that still require on-the-fly computation.