Memcached is choking on requests because its internal worker threads are spending all their time waiting for network I/O to complete, preventing them from processing new incoming commands.
Common Causes and Fixes
1. Network Interface Saturation:
-
Diagnosis: Check network interface statistics for errors, dropped packets, or excessive retransmits.
sudo ethtool -S eth0 | grep -Ei 'drop|error|reje'Look for non-zero values in
rx_dropped,tx_dropped,rx_errors,tx_errors. Also, monitor overall network traffic on the server usingiftopornload. If your interface is consistently near its line rate (e.g., 1 Gbps or 10 Gbps), it’s a bottleneck. -
Fix: Upgrade network interface cards (NICs) or network infrastructure, or distribute the load across multiple servers. If you’re on a virtual machine, ensure the underlying physical NIC isn’t overloaded.
-
Why it works: High network traffic or errors mean the Memcached server can’t even receive or send data fast enough, causing its worker threads to block on socket operations. Faster or less congested network hardware resolves this.
2. Insufficient Worker Threads:
-
Diagnosis: Memcached uses worker threads to handle network I/O and command processing. If the number of threads is too low, they can’t keep up with the request rate. Check your Memcached startup configuration for the
-tflag.ps aux | grep memcached # Example output: /usr/bin/memcached -m 64 -p 11211 -t 4 -u memcachedHere,
-t 4indicates 4 worker threads. Compare this to the number of CPU cores on your server. -
Fix: Increase the number of worker threads. A common starting point is to set
-tequal to the number of CPU cores.# Example: If you have 8 cores, modify your init script or systemd service file # to use -t 8 instead of the current value. # For systemd: # sudo systemctl edit memcached.service # Add: # [Service] # ExecStart= # ExecStart=/usr/bin/memcached -m 64 -p 11211 -t 8 -u memcached # Then: # sudo systemctl daemon-reload # sudo systemctl restart memcached -
Why it works: More worker threads allow Memcached to handle more concurrent network connections and process commands in parallel, reducing the time spent waiting on I/O.
3. High Number of Connections:
-
Diagnosis: A very large number of active client connections can consume resources and lead to high CPU. Check the current number of connections.
echo "conn:quit" | nc 127.0.0.1 11211 | grep -o 'curr_conns:[0-9]*' # Example output: curr_conns:15000If
curr_connsis consistently in the tens of thousands or higher, it might be an issue. -
Fix:
- Client-side: Implement connection pooling on your application servers. Reusing existing connections is far more efficient than establishing new ones for every request.
- Server-side: Increase the maximum number of connections Memcached will accept using the
-cflag (max_connections). The default is 1024.# Example: Increase max_connections to 2048 # Modify your init script or systemd service file: # sudo systemctl edit memcached.service # Add: # [Service] # ExecStart= # ExecStart=/usr/bin/memcached -m 64 -p 11211 -t 4 -c 2048 -u memcached # Then: # sudo systemctl daemon-reload # sudo systemctl restart memcached
-
Why it works: Connection pooling reduces the overhead of
accept()and socket setup for each request. Increasingmax_connectionsallows the server to accept more simultaneous clients without rejecting them, but this should be paired with sufficient worker threads.
4. Excessive Evictions:
-
Diagnosis: When Memcached runs out of memory, it starts evicting older items to make space for new ones. Frequent evictions indicate the cache is too small for the working set of data. Monitor evictions:
echo "stats:quit" | nc 127.0.0.1 11211 | grep -E 'evictions:|bytes_moved_to_free:' # Example output: evictions:1500000,bytes_moved_to_free:120000000A high rate of
evictionsandbytes_moved_to_freeover time points to this. -
Fix: Increase the memory allocation for Memcached using the
-mflag. Ensure the server has enough available RAM.# Example: Increase memory from 64GB to 128GB # Modify your init script or systemd service file: # sudo systemctl edit memcached.service # Add: # [Service] # ExecStart= # ExecStart=/usr/bin/memcached -m 128 -p 11211 -t 4 -u memcached # Then: # sudo systemctl daemon-reload # sudo systemctl restart memcached -
Why it works: By providing more memory, Memcached can hold more data, reducing the need to evict items. This means fewer background operations for eviction, freeing up CPU for processing actual requests.
5. Inefficient Key/Value Sizes or Operations:
-
Diagnosis: Storing very large individual keys or values, or performing complex operations (though Memcached is primarily a key-value store, some clients might serialize/deserialize large objects). Monitor item sizes if possible, or analyze application logic. High CPU might also correlate with specific operations if you can segment traffic.
-
Fix: Optimize your application to store smaller, more granular data. Avoid storing very large blobs directly in Memcached. Consider data serialization formats that are more compact.
-
Why it works: Smaller items are faster to serialize, deserialize, and transfer over the network, reducing the processing burden on Memcached and the network.
6. CPU Throttling (Virtualization/Containers):
-
Diagnosis: If Memcached is running in a VM or container, it might be subject to CPU limits imposed by the hypervisor or container orchestrator. Check the CPU usage metrics provided by your virtualization platform or container runtime. For containers,
docker statsorkubectl top podcan show CPU usage relative to limits. -
Fix: Increase the CPU allocation or remove/relax CPU limits for the Memcached VM or container.
-
Why it works: CPU limits directly restrict the processing power available to Memcached, forcing its threads into a waiting state when they would otherwise be active.
After addressing these, you might encounter issues with slab_realloc if your memory allocation is too small and you are frequently growing your slabs.