Memcached’s hashing algorithm for distributing keys across its nodes is surprisingly simple, yet remarkably effective at avoiding hotspots.

Let’s see it in action. Imagine you have a cache cluster and you’re setting and getting a bunch of keys.

# On a client machine:
# Install memcached client library (e.g., for Python)
# pip install python-memcached

import memcache

# Connect to your memcached cluster (replace with your server IPs/ports)
mc = memcache.Client(['10.0.0.1:11211', '10.0.0.2:11211', '10.0.0.3:11211'], debug=0)

# Set some keys
mc.set("user:12345", "{\"name\": \"Alice\", \"email\": \"alice@example.com\"}")
mc.set("product:abcde", "{\"name\": \"Widget\", \"price\": 19.99}")
mc.set("session:xyz789", "some_session_data")
mc.set("cache:config:general", "{\"timeout\": 300}")
mc.set("user:54321", "{\"name\": \"Bob\", \"email\": \"bob@example.com\"}") # Notice key similarity

# Get some keys
print(mc.get("user:12345"))
print(mc.get("product:abcde"))
print(mc.get("session:xyz789"))
print(mc.get("cache:config:general"))
print(mc.get("user:54321"))

When mc.set("user:12345", ...) is called, the python-memcached library (or any other memcached client) doesn’t just randomly pick a server. It takes the key string "user:12345", feeds it into a hashing function, and uses the resulting hash to determine which of the configured memcached servers (10.0.0.1:11211, 10.0.0.2:11211, etc.) should store or retrieve that data. The goal is to make sure that over a large number of keys, the hashes are spread out evenly, meaning no single memcached server gets overloaded with requests.

The magic behind this even distribution is a hashing algorithm called MurmurHash. Specifically, memcached uses MurmurHash3, a non-cryptographic hash function known for its speed and excellent distribution properties. The client library implements this algorithm. When you tell a client about your memcached servers, it typically uses a consistent hashing scheme. This means that if you add or remove a server from the cluster, only a small fraction of keys will need to be remapped, rather than remapping all of them.

Here’s how it breaks down:

  1. Key Input: The client takes the string key (e.g., "user:12345").
  2. Hashing: The key is passed to the MurmurHash3 algorithm. This algorithm generates a 32-bit or 128-bit integer hash value. For memcached, it’s typically a 32-bit hash. The brilliance of MurmurHash is that even small changes in the input key result in drastically different hash outputs, and for a large set of random keys, the resulting hash values will be spread very uniformly across the entire possible range of hash values.
  3. Modulo Operation: The resulting hash value is then subjected to a modulo operation with the number of available memcached servers. For example, if you have 3 servers, the hash value H is transformed into H % 3. This result (0, 1, or 2) directly maps to one of your configured memcached servers.
  4. Consistent Hashing (Rebalancing): When you add or remove a server, the total number of servers changes. If you have N servers and add one, making it N+1, the modulo operation H % (N+1) will change the target server for some keys. Consistent hashing algorithms, like the one used by most memcached clients, employ techniques (often involving a ring data structure) to minimize the number of keys that need to be moved. Instead of a random redistribution, only keys whose hashes fall into a specific "segment" of the hash ring are affected.

The most surprising aspect is how this simple modulo operation, when applied to the output of a high-quality hash function like MurmurHash3, provides such robust load balancing without complex coordination between memcached servers themselves. The intelligence is entirely in the client.

The actual implementation of MurmurHash3 in a client library might look something like this (simplified Python example):

# This is a conceptual representation, not actual memcached client code.
# Real clients use optimized C libraries for performance.

def murmurhash3_32(key_bytes, seed=0):
    # ... complex MurmurHash3 algorithm implementation ...
    # For demonstration, we'll use Python's built-in hash and simulate randomness.
    # THIS IS NOT A REAL MURMURHASH IMPLEMENTATION!
    return hash(key_bytes.decode('utf-8') + str(seed)) & 0xFFFFFFFF

def get_server_index(key, servers, seed=0):
    hash_val = murmurhash3_32(key.encode('utf-8'), seed)
    return hash_val % len(servers)

servers = ['10.0.0.1:11211', '10.0.0.2:11211', '10.0.0.3:11211']
key1 = "user:12345"
key2 = "user:54321" # Similar key

index1 = get_server_index(key1, servers)
index2 = get_server_index(key2, servers)

print(f"Key '{key1}' maps to server index: {index1}")
print(f"Key '{key2}' maps to server index: {index2}")
# Notice how similar keys can still map to different servers due to hash function properties.

The "seed" value in MurmurHash can also be used to create different hashing behaviors for the same set of keys if needed, though typically it’s kept constant for a given cluster configuration.

Understanding how the client hashes keys is crucial when debugging uneven load. If you observe one memcached server consistently receiving a disproportionate amount of traffic, the root cause is almost always in the keys themselves. Keys that are too similar, or keys that exhibit patterns (e.g., all starting with "user:"), can sometimes lead to hash collisions or clustering if the hash function isn’t robust enough. MurmurHash3 is generally robust enough that this is rare, but it’s a possibility. More often, it’s a sign that your application is generating keys in a way that isn’t truly random across the key space.

The next step after understanding key distribution is exploring how memcached handles its eviction policies when the cache fills up.

Want structured learning?

Take the full Memcached course →