Memcached CPU Threads: Scale Workers for High Throughput (2026)

Memcached can bottleneck on CPU if it’s not configured with enough worker threads to handle concurrent requests.

Here’s how to scale your Memcached workers and why it matters:

# Example: Starting Memcached with 8 worker threads
memcached -m 64 -c 1024 -t 8

Memcached uses a pool of worker threads to service incoming requests. By default, it often starts with a single thread, which can quickly become a bottleneck under heavy load. When a single thread is overwhelmed, requests queue up, leading to increased latency and reduced throughput. Increasing the number of worker threads allows Memcached to process more requests in parallel, directly addressing CPU-bound performance issues.

Common Causes of Memcached CPU Bottlenecks

Default Thread Count: Most Memcached installations, especially when started manually or via basic init scripts, default to a single worker thread (-t 1). This is insufficient for even moderately busy applications.
- Diagnosis: Check your running Memcached process. On Linux, ps aux | grep memcached will show the command line arguments. Look for the -t flag.
- Fix: Restart Memcached with a higher -t value. For a 4-core CPU, starting with -t 4 is a reasonable baseline. For 8 cores, -t 8 is a good starting point.
- Why it works: This explicitly tells Memcached to create and utilize that many threads for request processing.
Under-provisioned CPU Resources: Even with multiple threads, if the underlying server lacks sufficient CPU cores, Memcached will still struggle.
- Diagnosis: Monitor your server’s CPU utilization using tools like top, htop, or sar. Look for sustained high CPU usage (above 80-90%) across all cores.
- Fix: If CPU is maxed out, you need more CPU power. This could mean migrating to a larger instance type, adding more CPU cores to your existing instance, or optimizing other processes consuming CPU on the same host.
- Why it works: More CPU cores allow the operating system to schedule more Memcached worker threads (and other processes) simultaneously, reducing contention and improving overall throughput.
Inefficient Client Libraries or Network Stack: Sometimes, the bottleneck isn’t Memcached itself, but how clients interact with it. Frequent, small get operations or inefficient connection pooling can lead to a high volume of requests that overwhelm the network or client-side processing, which then manifests as CPU load on the Memcached server.
- Diagnosis: Use network monitoring tools (like iftop, nload) to check bandwidth usage. On the client side, profile your application to see the rate of Memcached operations. If you see a very high rate of individual get operations instead of batching gets, this is a clue.
- Fix: Implement batching of get operations (using gets with multiple keys) on the client side where possible. Ensure your client libraries are configured for efficient connection pooling.
- Why it works: Batching reduces the number of round trips and network overhead. Efficient connection pooling amortizes the cost of establishing connections, reducing per-request overhead.
High Cache Miss Rate: If your application is frequently requesting data that isn’t in Memcached, the CPU cycles spent processing these get requests that result in misses are wasted.
- Diagnosis: Monitor Memcached’s get_misses and cmd_get statistics. A high ratio of get_misses to cmd_get indicates frequent misses. echo "stats" | nc localhost 11211 is your friend here.
- Fix: Increase the Memcached item count (-I flag, default is 1MB per item) or increase the total memory allocated (-m flag). Review your application’s caching strategy to ensure frequently accessed data is staying in memory.
- Why it works: More memory means more data can be held in cache, leading to fewer cache misses and thus fewer wasted CPU cycles on unproductive get operations.
NUMA (Non-Uniform Memory Access) Issues: On multi-socket or multi-NUMA node systems, Memcached threads might be running on a different NUMA node than the memory they are accessing, leading to performance degradation due to cross-node memory access penalties.
- Diagnosis: Use numactl --hardware to inspect your NUMA topology. Then, use numactl -H to see which NUMA node your Memcached process is bound to. Check if its memory allocation is also on that node.
- Fix: Run Memcached with numactl to bind it to a specific NUMA node and its memory. For example, numactl --cpunodebind=0 --membind=0 memcached -t 8.
- Why it works: This ensures that the Memcached worker threads and the memory they operate on reside on the same NUMA node, minimizing latency associated with inter-node memory access.
Large Item Evictions: If Memcached is constantly evicting items due to memory pressure, the CPU cycles involved in the eviction process can add up.
- Diagnosis: Monitor evictions in the stats output. High eviction rates, especially if they correlate with CPU spikes, indicate memory pressure.
- Fix: Increase the allocated memory (-m flag). If items are large, consider the -I flag to allow larger individual items without impacting overall memory distribution as much.
- Why it works: More memory means fewer items need to be evicted, freeing up CPU cycles that would otherwise be spent managing the eviction process.

After addressing CPU threading, the next common issue you’ll encounter is network saturation if your Memcached server can now handle requests faster than the network interface can serve them.