MongoDB’s deleteMany operation doesn’t immediately return disk space to the operating system.

The core issue is that when you delete documents from a MongoDB collection, the space they occupied isn’t automatically unlinked from the filesystem. Instead, MongoDB marks that space as available for reuse within its own data files. This is a performance optimization; constantly resizing data files would be very slow. However, if you’ve performed large deletions and your data files have grown significantly, you’ll see high disk usage even though the actual data stored is much smaller.

Here’s how to reclaim that space and what to check for:

1. The Obvious: compact Command (and why it’s not always the answer)

The compact command is the most direct way to tell MongoDB to reorganize its data files and release space.

  • Diagnosis: Run db.collection.stats() to see size (total data size) and storageSize (space allocated on disk). If storageSize is much larger than size, compact might help.
  • Fix: Execute db.collection.runCommand({ compact: "your_collection_name" }). For an entire database, use db.runCommand({ compactDatabase: "your_database_name" }).
  • Why it works: This command rewrites the collection’s data file, copying only the live documents to a new file and then replacing the old one. Any unused space is then no longer part of the new, smaller file.

Important Caveat: compact can be a very heavy operation. It locks the collection (or database for compactDatabase) during execution, making it unavailable for writes and reads. For large collections, this can mean significant downtime. It also requires enough free disk space to hold the rewritten collection temporarily.

2. WiredTiger’s Internal Management: evict and checkpoint

For the default WiredTiger storage engine, compact is often overkill. WiredTiger has its own mechanisms for managing space.

  • Diagnosis: Check WiredTiger’s internal statistics. Run db.serverStatus().wiredTiger.cache and db.serverStatus().wiredTiger.log. Look at bytes currently in the cache and maximum bytes in the cache for cache usage. Also, check log bytes written and log bytes read for log activity.
  • Fix: WiredTiger automatically performs checkpoints periodically, which is when it writes cached data to disk and reclaims some internal space. You can manually trigger a checkpoint by running db.adminCommand({ serverStatus: 1, wiredTiger: 1 }) and observing the checkpoint section for last checkpoint. You can also try db.adminCommand({ flushWAL: 1 }) to force a write of the journal to disk, which can sometimes free up space.
  • Why it works: WiredTiger uses a log-structured merge-tree (LSM-tree) approach. Data is written to a cache and then periodically flushed to disk in checkpoints. The evict process within WiredTiger also helps move data out of the cache and onto disk. flushWAL ensures the journal is written, allowing older journal entries to be discarded.

3. Journaling and Log Files

MongoDB’s journaling is crucial for durability but can also consume disk space.

  • Diagnosis: Check the size of your journal files in your MongoDB data directory (usually under journal/). You can also see journal activity in db.serverStatus().wiredTiger.log.
  • Fix: Ensure journaling is enabled (storage.journal.enabled: true in your config file). The journal files are naturally pruned as checkpoints occur. If they are growing excessively and not being pruned, it might indicate an issue with checkpointing or a very high write volume that outpaces checkpoints. A db.adminCommand({ flushWAL: 1 }) can help force a checkpoint and prune logs.
  • Why it works: Journaling writes all data modifications to a write-ahead log before they are applied to the main data files. This log is used for recovery. Once data is durably written to data files during a checkpoint, older journal entries can be safely removed.

4. Renormalization and Replicas

If you’re using replica sets, you might have space issues on secondary members.

  • Diagnosis: Compare storageSize on the primary vs. secondary members using db.collection.stats(). If a secondary is significantly larger for the same collection, it might be lagging or have issues applying oplog entries.
  • Fix: For a lagging secondary, a common fix is to restart the mongod process on that secondary. If it continues to lag, you might need to re-clone the data from the primary. This is done by stopping the mongod on the secondary, clearing its data directory, and restarting it with the --replSet option and potentially --oplogSize if you have a very high write load.
  • Why it works: Re-cloning ensures the secondary starts with a clean, up-to-date copy of the data files from the primary. This effectively discards any potentially bloat or inconsistencies on the secondary.

5. Background Operations and Index Builds

Heavy background tasks can temporarily increase disk usage.

  • Diagnosis: Monitor db.currentOp() for long-running operations like index builds ("op": "command", "command.aggregate": "..." with createIndexes command, or background: true in index creation).
  • Fix: Let background index builds complete. They will eventually finish and free up any temporary space used. If an index build is stuck or taking an unreasonable amount of time, investigate why (e.g., insufficient resources, complex index keys). You can cancel a long-running index build with db.collection.dropIndex("index_name") if it’s a background build.
  • Why it works: Index builds, especially background ones, create new index structures. While they are running, both the old and new index might exist, temporarily doubling the space for that index. Once complete, the old index is removed.

6. mongodump / mongorestore and repair

These are more drastic measures, often used when corruption is suspected or for a complete refresh.

  • Diagnosis: If you suspect data corruption or have tried other methods without success, a mongodump followed by a mongorestore can be a way to get a clean copy of your data.
  • Fix:
    1. Run mongodump --db your_database_name --out /path/to/backup.
    2. Stop your mongod instance.
    3. Remove the existing data directory.
    4. Start mongod with an empty data directory.
    5. Run mongorestore --db your_database_name /path/to/backup/your_database_name. Alternatively, for older versions or specific issues, mongod --repair could be considered, but it’s very disruptive and rarely the best option with WiredTiger.
  • Why it works: This process reads only the valid data from your existing files, writes it to new files during the dump, and then creates entirely new data files during the restore. It effectively rebuilds the entire collection/database from scratch.

The next error you’ll likely hit after fixing disk space issues is related to network connectivity or insufficient RAM if your working set is too large for the available memory.

Want structured learning?

Take the full Mongodb course →