Memcached itself doesn’t offer built-in replication or clustering. High availability must be implemented at the client library level.

Let’s see this in action. Imagine a simple Python application that needs to read from Memcached.

from pymemcache.client.hash import HashClient

# Instead of a single server, we provide a list of servers
client = HashClient([('127.0.0.1', 11211), ('127.0.0.1', 11212)],
                    # This is the crucial part for HA
                    nolock=True, connect_timeout=1, read_timeout=1,
                    server_max_retry=3, retry_delay=0.1)

try:
    client.set('my_key', 'my_value')
    value = client.get('my_key')
    print(f"Successfully retrieved: {value.decode('utf-8')}")
except Exception as e:
    print(f"An error occurred: {e}")

Here, HashClient from pymemcache is configured with multiple Memcached server addresses. If the primary server (127.0.0.1:11211) becomes unavailable, the client library will automatically attempt to connect to and use the next available server (127.0.0.1:11212) for subsequent operations. The server_max_retry and retry_delay parameters control how aggressively the client tries to reach a server before giving up and switching.

The problem Memcached high availability solves is the single point of failure inherent in a single-instance setup. If that one Memcached server goes down, your application loses its cache, leading to increased load on your primary data stores and potentially slower response times or even outages. Client-side redundancy distributes the load and provides failover capabilities.

Internally, most client libraries implement this using one of two primary strategies: consistent hashing or simple round-robin.

Consistent Hashing: This is the more sophisticated approach. Keys are hashed onto a ring. Each Memcached server is also hashed onto the ring. When a key needs to be stored or retrieved, the client finds the key’s position on the ring and then assigns it to the next server clockwise on the ring. The magic of consistent hashing is that when a server is added or removed, only a small fraction of keys need to be remapped, minimizing cache churn and disruption. Libraries like python-memcached and pymemcache (with HashClient) use this.

Round-Robin: A simpler approach where the client simply cycles through the list of available servers for each operation. The first request might go to server A, the second to server B, the third back to A, and so on. This is easier to implement but can lead to uneven distribution if servers have different capacities or performance characteristics, and it’s less resilient to temporary network glitches between specific client-server pairs.

The core levers you control are the list of Memcached server addresses provided to the client library and the retry/timeout configurations.

  • Server List: [('host1', port1), ('host2', port2), ...] This is the fundamental HA configuration. More servers mean more potential redundancy.
  • connect_timeout: How long the client waits for a connection to be established. A lower value (e.g., 0.5 seconds) promotes faster failover but might prematurely mark a temporarily sluggish server as down.
  • read_timeout: How long the client waits for a response after sending a request. Similar trade-offs to connect_timeout.
  • server_max_retry: The number of times the client will try to reach a specific server before considering it unavailable and moving to the next. Higher values increase the chance of a slow server eventually responding but delay failover.
  • retry_delay: The pause between retries for a single server.

When using consistent hashing, the client library maintains an internal mapping of keys to server instances. This mapping is updated dynamically as servers become available or unavailable. The client attempts to perform an operation on the designated server. If that server fails (e.g., connection refused, timeout), the client library will immediately try the next server on the consistent hash ring for that specific key. The key is then effectively "assigned" to this new server for the duration of its availability.

The most surprising thing most people don’t realize is that "availability" here is purely a client-side illusion. Memcached servers themselves are stateless and don’t know about each other. If you have two Memcached instances, mc1:11211 and mc2:11212, and your client is configured with both, mc1 has no idea mc2 exists, nor does it know if mc2 is serving requests for keys that mc1 could have served. The client library is the sole orchestrator, deciding which server gets which key and what to do when one fails.

The next step in managing Memcached performance often involves understanding cache invalidation strategies.

Want structured learning?

Take the full Memcached course →