Multikey indexes can make your queries incredibly slow if you don’t understand how they work under the hood.

Let’s see what happens when we query an array field with a multikey index.

// Assume we have a collection 'users' with documents like:
// { _id: 1, name: "Alice", tags: ["developer", "javascript"] }
// { _id: 2, name: "Bob", tags: ["designer", "css"] }
// { _id: 3, name: "Charlie", tags: ["developer", "python"] }

// We create a multikey index on the 'tags' array:
db.users.createIndex({ tags: 1 })

// Now, let's query for users with the tag "developer":
db.users.find({ tags: "developer" })

When MongoDB encounters an indexed array field in a query, it effectively "expands" the index. For each element in the array, it creates a separate entry in the index pointing to the document. So, for the document { _id: 1, name: "Alice", tags: ["developer", "javascript"] }, the tags multikey index will have two entries: one for "developer" and one for "javascript", both pointing to _id: 1.

When you query db.users.find({ tags: "developer" }), MongoDB looks up "developer" in the index. It finds the index entry for "developer" and retrieves the _id: 1 pointer. Because it’s a multikey index, it knows this document might also have other elements in the tags array that match other criteria if you had a compound index, but for a simple equality match on a single field, it’s straightforward. The query engine then fetches the document associated with _id: 1 and checks if it matches the query criteria (which it does).

The problem arises with compound indexes involving multikey fields. Consider an index on { tags: 1, age: 1 }. If a document has tags: ["developer", "javascript"] and age: 30, MongoDB creates index entries for ("developer", 30) and ("javascript", 30). If you query db.users.find({ tags: "developer", age: 30 }), MongoDB can efficiently use the compound index. However, if you query db.users.find({ tags: ["developer", "javascript"], age: 30 }), MongoDB treats each element of the tags array as a separate potential match. This means it will look for documents that have "developer" and age: 30, and documents that have "javascript" and age: 30. If a document has tags: ["developer", "javascript"] and age: 30, it will be considered a match for both conditions independently. This can lead to duplicate checks and, more importantly, an inability to use the index effectively for the entire query if the array elements are treated as individual OR conditions.

The real surprise is how MongoDB handles queries with multiple array elements against a single multikey indexed field. If you query db.users.find({ tags: { $in: ["developer", "designer"] } }), MongoDB will use the multikey index on tags. It will find all documents where tags contains "developer" and all documents where tags contains "designer". The index is scanned for both values, and the results are combined. The efficiency here depends heavily on the cardinality of the terms within the tags array.

If you have a compound index like { "tags.$": 1, "status": 1 } (where tags.$ is an indexed positional operator, not a multikey index on the array itself) and you query { "tags.0": "developer", "status": "active" }, this is not using the multikey index on the array. The positional operator index is designed for specific positional access within an array, not for general multikey queries. A true multikey index on tags would be { tags: 1 }.

The most common pitfall is creating a compound index where one of the fields is an array and you intend to query across multiple elements of that array with an $in or similar operator. For instance, if you have { user_ids: 1, order_date: 1 } and your user_ids field is an array, querying db.orders.find({ user_ids: { $in: [101, 102] }, order_date: ISODate("2023-10-27") }) will cause MongoDB to expand the user_ids index. For each document, it will check if any user_id in the array matches 101 or 102, and then check order_date. This can be inefficient because the index effectively becomes multiple separate indexes for each element in the array, and the order_date equality check might not be optimally utilized across all these expanded index entries.

To optimize queries involving multiple array elements, consider if you can reshape your data. If you frequently query for specific combinations of array elements, it might be better to create separate documents for each combination or use a different indexing strategy. For example, if you have tags: ["developer", "javascript"] and often query for developer AND javascript, you might create a separate field like primary_tag: "developer" or denormalize.

The next logical step is exploring how to query specific elements within an array using the positional operator ($).

Want structured learning?

Take the full Mongodb course →