Pre-loading Memcached after a restart isn’t about "warming it up" in the sense of bringing a car engine to operating temperature; it’s about ensuring your application doesn’t grind to a halt when it hits the cache for the first time after a deployment or outage.

Let’s see this in action. Imagine a web application that serves product details. Without pre-loading, the first request for a popular product might look like this:

  1. User requests /products/123.
  2. Application checks Memcached for product:123. Cache miss.
  3. Application queries the database for product 123. (Slow!)
  4. Application receives data from the database.
  5. Application stores product:123 in Memcached (e.g., with a 5-minute TTL).
  6. Application returns data to the user.

The next user requesting /products/123 gets a cache hit. But that first hit can cause a significant latency spike for the user, and a surge of load on the database. Pre-loading aims to populate Memcached before users start hitting the application.

The core problem Memcached pre-loading solves is the "thundering herd" problem on your primary data store immediately following a cache or application restart. When Memcached is empty, every single cache miss translates directly into a database query. If your cache has a Time-To-Live (TTL) of, say, 300 seconds (5 minutes), and your application restarts, that 5-minute window is essentially a ticking bomb for your database.

Here’s how it works internally. Memcached is a key-value store. When you store data, you provide a key (a string) and a value (a blob of data), along with an optional TTL. The application code is responsible for deciding what data to fetch and when to put it into Memcached. Pre-loading simply means running this "fetch and store" logic proactively, often via a separate script or a dedicated process that runs before the main application instances are fully available to serve user traffic.

The exact levers you control are the keys you choose to pre-load, the order in which you fetch them, and the TTL you assign. For a product catalog, you might pre-load:

  • The top N most frequently viewed products.
  • Products in categories that are currently being advertised.
  • Products that have recently been updated.

A common approach is to have a script that iterates through a list of critical keys, fetches the corresponding data from the database, and then uses memcached-tool or a client library to add or set these values.

For example, to pre-load a specific product:

# Assume PRODUCT_ID="123" and MEMCACHED_SERVER="127.0.0.1:11211"
PRODUCT_DATA=$(psql -U myuser -d mydb -c "SELECT json_agg(row_to_json(products)) FROM products WHERE id = ${PRODUCT_ID}")
echo "$PRODUCT_DATA" | nc ${MEMCACHED_SERVER} 11211 <<EOF
set product:${PRODUCT_ID} 0 300
$(echo "$PRODUCT_DATA" | sed 's/^[^{]*//; s/[^}]*$//')
EOF

This script first queries the database for product data, then uses nc (netcat) to directly send a set command to Memcached. The 0 is for flags (unused here), 300 is the TTL in seconds. The sed command is a crude way to extract just the JSON payload. In a real application, you’d use a proper Memcached client library in your preferred language (Python, Ruby, Node.js, etc.) for more robust error handling and data serialization.

The most surprising thing is how often pre-load scripts themselves become bottlenecks or fail silently. If your pre-load script queries the database in a way that’s not optimized (e.g., N+1 queries instead of a single query for multiple items), it can overwhelm the database during the pre-load phase, defeating the purpose. You might also forget to account for serialization/deserialization differences between your database’s output and what Memcached expects, leading to cache misses after the data is "loaded."

The next challenge you’ll face is managing the TTLs effectively, especially when dealing with data that changes frequently.

Want structured learning?

Take the full Memcached course →