MongoDB documents can’t exceed 16MB.

This limit exists for a few reasons. First, it prevents a single, massive document from consuming excessive memory or disk space, potentially impacting the performance of the entire database. Second, it simplifies internal data management; MongoDB uses BSON (Binary JSON) to store documents, and the 16MB limit is a practical constraint for efficient serialization and deserialization. Finally, it encourages a more distributed and scalable document design, pushing users towards embedding smaller, related documents or referencing them rather than creating monolithic entries.

Let’s see how this plays out in practice. Imagine you’re building a system to store user profiles, and each profile includes a history of recent user actions.

// Example of a potentially problematic document structure
const userProfile = {
  _id: ObjectId("60c72b2f9b1e8b0c8c8c8c8c"),
  username: "jane_doe",
  email: "jane.doe@example.com",
  // ... other user fields
  actionHistory: [
    { timestamp: new Date("2023-10-26T10:00:00Z"), action: "login", details: "successful" },
    { timestamp: new Date("2023-10-26T10:05:00Z"), action: "view_profile", details: "user_id: 123" },
    { timestamp: new Date("2023-10-26T10:15:00Z"), action: "update_settings", details: "email_changed" },
    // ... potentially thousands more entries
  ]
};

If actionHistory grows very large, this single userProfile document could easily exceed the 16MB limit. This would result in an error like BSONObjectTooLarge: Document size of X bytes exceeds the maximum size of 16777216 bytes.

The core problem this limit solves is preventing resource exhaustion. If a single document could be arbitrarily large, a poorly designed application could inadvertently bring down the entire database by creating one giant document. This would consume all available RAM, thrash the disk, and make the database unresponsive. The 16MB limit acts as a failsafe, forcing developers to think about data partitioning and efficient storage patterns.

There are two primary strategies to work around this limit: embedding and referencing.

Embedding means keeping related data within the same document, but ensuring that the individual embedded arrays or sub-documents don’t grow too large. For the actionHistory example, you might decide to only store the last 100 actions directly within the user profile.

// Revised structure with limited embedding
const userProfileLimitedHistory = {
  _id: ObjectId("60c72b2f9b1e8b0c8c8c8c8c"),
  username: "jane_doe",
  email: "jane.doe@example.com",
  actionHistory: [
    { timestamp: new Date("2023-10-26T10:00:00Z"), action: "login", details: "successful" },
    // ... up to 100 actions
  ]
};

If you need to access older actions, you’d then use referencing. This involves storing the action history in a separate collection, and the user profile document would contain a reference (typically an _id) to those historical records.

// Separate collection for action history
const actionLog = {
  _id: ObjectId("60c72b2f9b1e8b0c8c8c8c8d"),
  userId: ObjectId("60c72b2f9b1e8b0c8c8c8c8c"), // Reference to userProfile
  timestamp: new Date("2023-10-25T09:00:00Z"),
  action: "legacy_operation",
  details: "completed"
};

In your application code, when you need the full action history, you’d perform a query to the userProfile collection to get the user’s basic info and recent actions, and then a separate query to the actionLog collection, filtering by userId, to retrieve older entries. MongoDB’s $lookup aggregation stage is excellent for joining data from different collections, effectively simulating the embedded experience when needed.

// Example using $lookup in an aggregation pipeline
db.userProfiles.aggregate([
  { $match: { _id: ObjectId("60c72b2f9b1e8b0c8c8c8c8c") } },
  {
    $lookup: {
      from: "actionLogs",         // The collection to join with
      localField: "_id",          // Field from the input documents (userProfiles)
      foreignField: "userId",     // Field from the documents of the "from" collection (actionLogs)
      as: "historicalActions"     // Output array field name
    }
  }
]);

This approach keeps individual documents within their size limits while allowing you to retrieve related data efficiently. The key is to decide what data is frequently accessed together and should be embedded, and what data can be accessed independently and should be in a separate collection.

A common misconception is that you should only embed if the data is small. The real driver is access patterns. If you frequently need to read a user’s profile and their last 100 actions together, embedding those 100 actions is efficient. If you only ever need to access the entire action history for auditing purposes, and rarely fetch it with the user profile, then a separate collection is better. MongoDB’s BSON format has an overhead for each field and element, so even if your data is small, deeply nested or excessively numerous small fields can still contribute to size.

The next logical problem you’ll encounter after managing document size is optimizing queries across these potentially separate collections, often leading you to explore indexing strategies for efficient joins and lookups.

Want structured learning?

Take the full Mongodb course →