MongoDB’s db.collection.stats() command is a surprisingly deep dive into how your data is actually stored and accessed, often revealing performance bottlenecks you wouldn’t suspect.

Let’s see it in action. Imagine you have a users collection. Running db.users.stats() might give you output like this (simplified for clarity):

{
  "ns" : "mydatabase.users",
  "size" : 123456789,
  "count" : 1000000,
  "avgObjSize" : 123,
  "storageSize" : 200000000,
  "numExtents" : 5,
  "nindexes" : 2,
  "totalIndexSize" : 50000000,
  "indexSizes" : {
    "_id_" : 25000000,
    "email_1" : 25000000
  },
  "ok" : 1
}

Here’s what’s actually happening under the hood, and what you can learn:

  • size: This is the compressed size of your documents on disk. It’s what MongoDB thinks it needs to store.
  • storageSize: This is the actual disk space allocated to the collection. It’s often larger than size due to pre-allocation, padding, and fragmentation. This is the number that directly impacts your disk usage.
  • count: The number of documents in the collection. Simple enough.
  • avgObjSize: The average size of a single document. Useful for understanding if your documents are growing unexpectedly.
  • numExtents: An extent is a contiguous block of disk space. More extents can sometimes indicate fragmentation, though this is less of a concern with WiredTiger.
  • nindexes: The number of indexes on the collection.
  • totalIndexSize: The total disk space consumed by all indexes on the collection. This is a crucial metric for performance.
  • indexSizes: A breakdown of the disk space used by each individual index.

The problem this command solves is understanding why your database is consuming so much disk space, or why certain queries are slow. You might have a seemingly small number of documents (count), but a massive storageSize or totalIndexSize can point to inefficiencies.

The core components you’re interacting with are the WiredTiger storage engine (in modern MongoDB versions) and the B-tree data structures used for both document storage and indexes. db.collection.stats() provides a window into how these structures are laying out your data on disk.

The storageSize is a great indicator of how much disk space is reserved for your collection. MongoDB, especially with WiredTiger, uses techniques like pre-allocation and compression. storageSize reflects the allocated space, which can be significantly larger than the size (the compressed document data). This difference can be due to several factors:

  • WiredTiger’s internal block management: WiredTiger allocates space in chunks. Even if your data doesn’t fill a chunk perfectly, the whole chunk is reserved.
  • Compression overhead: While compression reduces the size, the compressed blocks still occupy space, and WiredTiger manages these blocks.
  • Padding: MongoDB might pad documents to align them for better read performance, increasing storageSize beyond the raw compressed data size.
  • Fragmentation: Over time, as documents are updated or deleted, the space within allocated blocks might become fragmented, leading to storageSize being larger than the sum of the actual data within those blocks.

The totalIndexSize is equally critical because indexes are often the primary culprits for large disk footprints and slow read operations. A large totalIndexSize means your application is paying a significant disk I/O and memory cost just to look up documents. You can see which indexes are the largest in indexSizes.

A common misconception is that size is the actual disk usage. It’s not. size represents the logical size of the data after compression. storageSize is the actual physical space allocated on disk for the collection’s data files. The difference between size and storageSize is often where inefficiencies hide.

If storageSize is much larger than size, and you’re not seeing proportional gains in read performance, it suggests that your data files might be fragmented or that WiredTiger has allocated more space than immediately necessary. Running db.collection.reIndex() can sometimes help reclaim space by rebuilding all indexes, but for data file fragmentation, a mongodump and mongorestore operation is often the most effective, albeit disruptive, way to defragment the collection’s data files.

When you look at totalIndexSize, you’re looking at the sum of the sizes of all your B-tree indexes. Each index, while speeding up specific queries, consumes disk space and memory. If an index is rarely used but large, it’s a prime candidate for removal. Conversely, if a frequently queried field has no index, queries on that field will result in full collection scans, which are slow and resource-intensive. db.collection.stats() helps you identify this imbalance. The indexSizes field tells you which specific indexes are the biggest offenders.

The avgObjSize is a good sanity check. If this number suddenly spikes, it means your documents are growing, which could be due to adding new fields or storing larger data types (like large strings or embedded documents). This growth directly impacts size and, subsequently, storageSize.

Consider a scenario where storageSize is 1GB, but size is only 200MB. This 5x difference could be due to many small, fragmented data blocks, or significant padding. If your queries are already fast, this might be acceptable. But if you’re experiencing performance issues or running out of disk space, it’s a clear signal to investigate. Rebuilding indexes with db.collection.reIndex() can help with index fragmentation and sometimes indirectly with data file fragmentation by forcing a re-writing of data blocks. For severe data file fragmentation, a mongodump and mongorestore is the most thorough solution.

The next thing you’ll likely encounter after optimizing collection and index sizes is understanding query performance in relation to these metrics, which leads to analyzing query plans with db.collection.explain().

Want structured learning?

Take the full Mongodb course →