The mongod process on your secondary replica set members is failing to apply oplog entries from the primary, causing replication lag. This happens because the secondary’s mongod is either too slow to process the incoming oplog entries or is getting blocked by internal operations.
Common Causes and Fixes for MongoDB Oplog Replication Lag
1. Insufficiently Sized wiredTigerCacheSizeGB
- Diagnosis: Check the MongoDB logs on the secondary for messages indicating slow writes or disk I/O saturation. Use
db.serverStatus()and look at thewiredTiger.cachesection forbytes currently in the cacheandmaximum bytesto see if the cache is full. - Cause: The WiredTiger storage engine uses a cache to hold data and index blocks. If this cache is too small, MongoDB will frequently evict pages, leading to more disk reads and writes, slowing down oplog application.
- Fix: Increase the
wiredTigerCacheSizeGBin yourmongod.conffile. A common starting point is 50% of your available RAM on dedicated MongoDB servers, or less if other services share the server. For example, if you have 32GB of RAM, you might set it to16G. Restart themongodservice for the change to take effect.storage: wiredTiger: engine: wiredTiger collectionConfig: blockCompressor: snappy cache: # Increase cache size to 16GB for a server with 32GB RAM wiredTigerCacheSizeGB: 16 - Why it works: A larger cache allows more data and index pages to reside in memory, reducing the need for disk I/O and speeding up the read/write operations required to apply oplog entries.
2. Slow Disk I/O on Secondaries
- Diagnosis: Monitor disk I/O utilization on the secondary servers using tools like
iostat -xz 5(Linux). Look for high%util,await, andsvctmmetrics. Also, check MongoDB logs for messages like "waiting for read lock" or "slow operation" that might be disk-related. - Cause: The secondary needs to write oplog entries to disk before applying them. If the underlying storage is slow, this write operation becomes a bottleneck, delaying oplog application.
- Fix: Upgrade to faster storage (e.g., SSDs instead of HDDs), or optimize your RAID configuration. For example, ensure you’re using a RAID level suitable for write performance (like RAID 10) and that your I/O scheduler (e.g.,
noopordeadlineon Linux) is configured appropriately for SSDs. - Why it works: Faster disk I/O reduces the latency of writing oplog entries to disk, allowing the secondary to keep up with the primary’s write rate.
3. Network Bandwidth Saturation or Latency
- Diagnosis: Use
iperf3to test network throughput between the primary and secondary. Monitor network interface statistics on the secondary usingnloadoriftopfor high bandwidth utilization. Checkpingandmtrfor latency and packet loss between nodes. - Cause: If the network link between the primary and secondary is saturated or experiencing high latency/packet loss, the oplog entries will not reach the secondary fast enough, leading to a gap.
- Fix: Increase network bandwidth, reduce other network traffic on the link, or optimize network configuration (e.g., ensure jumbo frames are properly configured if used). For instance, if you’re using 1Gbps links and seeing saturation, upgrade to 10Gbps.
- Why it works: Sufficient bandwidth and a stable network connection ensure that oplog entries are transmitted quickly and reliably from the primary to the secondary.
4. Long-Running Read Operations on Secondaries
- Diagnosis: Use
db.currentOp(true)on the secondary to identify any long-running read operations (queries) that might be holding read locks. Look for operations with a hightimeRunningMicrosvalue. - Cause: While secondaries typically don’t block writes, long-running read operations can still impact oplog application by consuming resources or indirectly affecting performance through lock contention if not handled carefully.
- Fix: Optimize slow read queries on the secondary. Add indexes to speed them up, or consider scheduling large read operations during off-peak hours or on a dedicated read-only secondary. For example, if a query is taking 5 minutes, ensure it has appropriate indexes.
- Why it works: By reducing the duration and resource impact of read operations, the secondary’s
mongodprocess has more capacity to apply oplog entries efficiently.
5. Large Document Updates or Deletes
- Diagnosis: Examine the oplog on the primary (
db.oplog.rs.find().sort({$natural:-1}).limit(10)) for operations involving very large documents or operations that are taking a long time to complete. - Cause: If a single oplog entry represents a very large update or delete operation (e.g., updating many fields in a large document, or deleting a large document), applying this single entry can take a significant amount of time on the secondary, causing a temporary spike in lag.
- Fix: Break down large operations into smaller, more manageable ones. For instance, instead of updating 100 fields in one go, update them in batches of 10. For large deletes, consider a phased deletion approach.
- Why it works: Smaller operations are processed much faster by the secondary, preventing single, time-consuming oplog entries from creating significant lag.
6. Insufficiently Sized oplogSizeMB on Primary
- Diagnosis: Check the primary’s
mongod.conffor theoplogSizeMBsetting. If it’s too small, the oplog can roll over before secondaries have a chance to catch up. This is less about lag application and more about catching up if lag was very high. - Cause: The oplog is a capped collection. If its size is too small relative to the write volume, old entries can be purged before secondaries have applied them. This forces secondaries to resync from scratch.
- Fix: Increase the
oplogSizeMBon the primary. A common recommendation is to size it to hold at least 24-72 hours of oplog data. For example, if your write rate is 100MB/hour, a 3GB oplog can hold 72 hours of data. Restart the primarymongodservice.replication: oplogSizeMB: 3072 # Set oplog size to 3GB - Why it works: A larger oplog provides a longer retention period, giving secondaries more time to catch up even if they experience temporary lag.
7. High CPU Utilization on Secondaries
- Diagnosis: Monitor CPU usage on the secondary servers using
toporhtop. Ifmongodis consistently consuming near 100% of one or more CPU cores, it can slow down oplog application. - Cause: High CPU can be caused by many factors, including inefficient queries, background operations, or simply the overhead of applying a high volume of oplog entries.
- Fix: Identify the source of high CPU. This might involve optimizing queries, reducing background task frequency, or even upgrading the hardware. For example, if a specific aggregation pipeline is causing high CPU, optimize it or run it on a separate read-only node.
- Why it works: Freeing up CPU resources allows the
mongodprocess on the secondary to dedicate more processing power to applying oplog entries.
After addressing these issues, you might encounter BACKGROUND SYNC DETECTED errors if the secondary has fallen too far behind, indicating a need for a full resync.