Memcached is failing to evict older items and is reporting high memory usage, even though it’s not serving all the data it should.
The core issue is that memcached’s memory management, specifically its slab allocator, has become unbalanced. Memcached pre-allocates memory into "slabs" of various sizes. When an item is added, it’s placed into the smallest slab that can accommodate it. Over time, if the distribution of item sizes becomes skewed, you can end up with many slabs that are mostly full of small items, but also large, sparsely populated slabs that are wasting memory. This imbalance prevents memcached from effectively reusing memory, leading to "evictions" (dropping older items to make space) and the perception of high memory usage even when the total data size is less than the allocated memory.
Here are the common causes and how to fix them:
Cause 1: A sudden influx of many small items
If your application suddenly starts storing a large number of very small objects, they will all be allocated into the smallest available slabs. This can quickly fill up those slabs, forcing memcached to evict older, potentially larger items from other slabs to make room, or simply refuse new items if no space can be found.
Diagnosis:
Monitor your memcached statistics, specifically items and active_slabs. Look for a disproportionately high number of items relative to the total memory used, and check the chunk_size and chunks_in_use for the smallest slabs.
echo "stats items" | nc localhost 11211
echo "stats slabs" | nc localhost 11211
Pay close attention to slab classes with chunk_size of 64 bytes (the smallest). If chunks_in_use is very high and free_chunks_end is low for these small slabs, you’ve found your culprit.
Fix:
The most direct fix is to restart memcached with a larger max_item_size or, more effectively, to increase the number of slab classes that can hold smaller items. This is done by adjusting the -f (growth factor) and -n (minimum item size) flags when starting memcached. A smaller growth factor means more slab classes, leading to finer-grained allocation.
For example, if you’re using the default growth factor of 1.25 and minimum item size of 64 bytes, you might have slabs like 64, 80, 100, 125, 156, 195, 244, 305, 381, 476, 595, 744, 930, 1162, 1452, 1815, 2269, 2836, 3545, 4432, 5540, 6925, 8656, 10820, 13525, 16906, 21132, 26415, 33018, 41272, 51590, 64487, 80609, 100761, 125951, 157439, 196799, 245999.
If you see many small items, you might benefit from a smaller growth factor, say 1.1, which would create more intermediate slab sizes. Or, if you know you have many items around 100 bytes, you could explicitly define slab classes.
Example restart command (assuming current memory limit of 10GB):
memcached -m 10240 -f 1.1 -n 64
This restarts memcached with a 10GB memory limit, a growth factor of 1.1 (creating more slab classes between sizes), and a minimum item size of 64 bytes.
Why it works: A smaller growth factor means more distinct slab sizes are created, allowing items to be allocated into more appropriately sized chunks, reducing fragmentation and waste within slabs.
Cause 2: A large number of very large items
Conversely, if your application starts storing many large objects, they will consume space in the larger slabs. If these large items are also short-lived, they can create "holes" in larger slabs when evicted, which are too small for new, smaller items to fit into. This leads to wasted space within those large slabs.
Diagnosis:
Examine the stats slabs output. Look for slabs with a high total_malloced but a low chunks_in_use relative to total_chunks. This indicates that memory has been allocated for chunks, but many are not currently in use, implying fragmentation. Focus on slabs with larger chunk_size.
Fix: Restarting memcached with adjusted slab class definitions is the primary solution. The key here is to potentially reduce the number of very large slab classes or to ensure that the growth factor is appropriate for the distribution of your item sizes. If you have very few items larger than, say, 1MB, you might not need slab classes that can hold items up to 100MB.
Example restart command (if you want to limit very large slabs):
memcached -m 10240 -f 1.5 -n 64 -I 1048576 # -I limits max item size to 1MB
This restarts memcached with a 10GB memory limit, a growth factor of 1.5, a minimum item size of 64 bytes, and explicitly limits the maximum item size to 1MB. Any item larger than 1MB will be rejected.
Why it works: By limiting the maximum item size, you prevent memcached from allocating memory for slabs that are unlikely to be efficiently utilized if your large items are infrequent. This forces larger items to be broken down or rejected, leading to better overall memory utilization.
Cause 3: Inefficient key/value sizing
Sometimes, the issue isn’t the number of items, but how they are structured. Storing many very small, distinct pieces of data as individual memcached keys, when they could logically be grouped, leads to overhead for each key-value pair (key string, internal structures).
Diagnosis:
Analyze your application’s data patterns. Are you storing individual user preferences, each as a separate key? Or caching small fragments of HTML? The items and slabs stats can give clues, but this often requires application-level insight. If items is very high and memory usage is also high, but the average item size (total memory / total items) is small, this is a strong indicator.
Fix:
Modify your application to aggregate smaller pieces of data into larger, single memcached entries. For example, instead of caching user:1:pref:color, user:1:pref:font, user:1:pref:theme as three separate keys, store them as a single memcached entry keyed as user:1:prefs with a serialized value (e.g., JSON or a custom binary format) containing all preferences.
Why it works: This reduces the overhead associated with each individual key-value pair. Memcached’s slab allocator is more efficient when dealing with fewer, larger chunks of data rather than a multitude of tiny ones.
Cause 4: Memory Fragmentation within slabs
Even with a good slab distribution, if items are frequently updated and their size changes, or if items expire from larger slabs, memory can become fragmented within a slab. This means there’s enough free memory in a slab, but it’s not contiguous enough to hold a new item of that slab’s size.
Diagnosis:
This is harder to diagnose directly with standard stats. You’ll often see high memory usage and evictions, but the stats slabs might look reasonably balanced in terms of chunks_in_use vs. total_chunks for the relevant slabs. This is a more subtle form of waste.
Fix: The most effective fix is to restart memcached. This reinitializes the slab allocator and reclaims any fragmented memory. Regularly scheduled restarts (e.g., nightly or weekly) can prevent this from becoming a significant issue.
Why it works: A restart completely tears down and rebuilds the memory pools, ensuring that all available memory is contiguous and ready for allocation.
Cause 5: Too few slab classes (low growth factor)
This is the inverse of Cause 1. If the growth factor (-f) is too small (e.g., 1.05 or less), memcached creates a very large number of slab classes. While this seems good for fine-grained allocation, it can lead to many slabs being created that are rarely used, consuming memory for their metadata and initial chunk allocation, even if they remain empty.
Diagnosis:
Examine stats slabs. You’ll see a very long list of slab classes, many of which might have chunks_in_use of 0 or very low numbers, but still have a non-zero total_chunks and total_malloced.
Fix: Restart memcached with a slightly larger growth factor. A common default is 1.25.
Example restart command:
memcached -m 10240 -f 1.25 -n 64
Why it works: A larger growth factor reduces the total number of distinct slab classes, ensuring that only slabs that are likely to be used are pre-allocated, reducing overhead from unused slab structures.
Cause 6: Incorrect -n (minimum item size)
The -n flag sets the minimum size of an item that memcached will allocate. If this is set too high, small items will be padded up to this minimum size, wasting memory.
Diagnosis:
Check your memcached startup options. If -n is set to, for example, 128, but you frequently store items smaller than that, you’re wasting space.
Fix:
Restart memcached with a smaller -n value, typically 64 (the default).
Example restart command:
memcached -m 10240 -f 1.25 -n 64
Why it works: Setting -n to 64 ensures that the smallest possible allocation is 64 bytes. Any item smaller than 64 bytes will still take up 64 bytes, but items between 64 and 80 bytes will now fit into the first slab class (64 bytes) more efficiently than if -n was, say, 128.
After addressing these issues, you might encounter errors related to curr_items not matching total_items or get_misses increasing unexpectedly if the fixes were too aggressive and caused legitimate data to be evicted.