Memcached Cache Stampede: Prevent Thundering Herd (2026)

The most surprising thing about memcached cache stampedes is that they aren’t caused by memcached itself, but by the application accessing memcached.

Imagine this: your application needs a piece of data from memcached. It’s not there. So, your application goes to the database to fetch it, then writes it back to memcached. Now, imagine a hundred user requests all hit at the exact same time, and that data isn’t in the cache for any of them. Each of those hundred requests will independently decide the data isn’t there, and each will go to the database. That’s the thundering herd.

Here’s a typical scenario. A web server (let’s call it web-01) is serving requests. It has a memcached cluster (memcached-a, memcached-b) behind it.

// Application code (simplified Python-like pseudocode)
def get_user_data(user_id):
    cache_key = f"user:{user_id}"
    data = memcached_client.get(cache_key)
    if data is None:
        // Cache miss!
        print(f"Cache miss for {cache_key}")
        data = database.fetch_user(user_id)
        memcached_client.set(cache_key, data, ttl=3600) // Cache for 1 hour
    return data

Now, let’s say user:123’s data expires from memcached. A flood of requests for user:123 arrives simultaneously.

// Imagine these 100 requests happening within milliseconds of each other
get_user_data(123)
get_user_data(123)
get_user_data(123)
... (100 times)

Each of these get_user_data(123) calls will execute the if data is None: block. They all miss the cache. They all hit database.fetch_user(123). This is the stampede. The database, designed for concurrent reads but not necessarily a sudden, massive spike of identical, heavy reads, can become overloaded.

The core problem is that multiple application instances, upon detecting a cache miss, independently decide to re-fetch the data and repopulate the cache. The "fix" is to ensure that only one instance does this work, and others wait for its result.

The most common and effective pattern to prevent this is called "locking" or "single flight." When an application instance detects a cache miss, it doesn’t immediately fetch from the database. Instead, it tries to acquire a lock for that specific cache key.

Here’s how a locked approach works:

Acquire Lock: The application attempts to acquire a distributed lock for the cache key (e.g., lock:user:123). This lock is short-lived, perhaps only a few seconds. If it successfully acquires the lock, it proceeds to fetch from the database. If it fails to acquire the lock, it means another instance is already fetching the data.
Wait/Retry: If it fails to acquire the lock, the application waits for a short, random interval (e.g., 50-200ms) and then retries fetching from memcached. It keeps retrying until it either finds the data (because the instance holding the lock has populated it) or the lock expires.
Fetch & Release: The instance that successfully acquired the lock fetches the data from the database. Once it has the data, it writes it back to memcached and then releases the lock.

This requires a mechanism for distributed locking. You can implement this using:

Redis: Redis is excellent for distributed locks. You can use SETNX (SET if Not eXists) combined with an expiration time.

import redis
import time
import random

r = redis.Redis(host='redis-master', port=6379, db=0)

def get_user_data_locked(user_id):
    cache_key = f"user:{user_id}"
    lock_key = f"lock:{cache_key}"
    data = r.get(cache_key) # Try memcached first (assuming r is also your memcached client for simplicity here)

    if data is None:
        print(f"Cache miss for {cache_key}")
        # Try to acquire lock
        lock_acquired = r.set(lock_key, "locked", nx=True, ex=10) # nx=True means only set if not exists, ex=10 sets expiration to 10 seconds

        if lock_acquired:
            try:
                print(f"Lock acquired for {lock_key}")
                data = database.fetch_user(user_id)
                r.set(cache_key, data, ex=3600) # Cache for 1 hour
                print(f"Data fetched and cached for {cache_key}")
            finally:
                r.delete(lock_key) # Release the lock
                print(f"Lock released for {lock_key}")
        else:
            # Lock not acquired, another process is fetching. Wait and retry.
            print(f"Lock not acquired for {lock_key}, waiting and retrying...")
            time.sleep(random.uniform(0.05, 0.2)) # Wait 50-200ms
            return get_user_data_locked(user_id) # Recursive retry
    return data

# Example usage:
# user_data = get_user_data_locked(123)

The r.set(lock_key, "locked", nx=True, ex=10) command is the magic. nx=True ensures the key is only set if it doesn’t already exist, which is the core of the lock. ex=10 sets an expiration time of 10 seconds, preventing deadlocks if the process holding the lock crashes. If set returns True, you got the lock. If False, someone else has it.

ZooKeeper/etcd: These distributed coordination services offer robust primitives for distributed locking, though they are heavier infrastructure than Redis. The principle is similar: attempt to create an ephemeral, sequential node. If you get the lowest sequence number, you hold the lock.

The critical part of the "wait and retry" is the random backoff. If all waiting processes retried at the exact same time, you’d just shift the stampede. Randomness helps stagger them.

The underlying mechanism is that the first process to successfully acquire the lock becomes the "designated fetcher." All other processes attempting to fetch the same data are effectively told, "Hold on, someone else is on it." They then poll the cache, and as soon as the designated fetcher puts the data back, they’ll find it and move on.

The next thing you’ll likely encounter is how to handle write contention and ensuring data consistency across multiple caches and services.