Memcached can hog network buffers if not configured correctly, leading to increased latency under load.
Let’s see it in action. Imagine a simple application:
import memcache
# Connect to a local memcached instance
client = memcache.Client(['127.0.0.1:11211'], debug=0)
# Set a value
client.set("my_key", "my_value")
# Get the value
value = client.get("my_key")
print(f"Retrieved value: {value}")
This looks straightforward, right? But the magic (and potential trouble) happens under the hood, specifically with how memcached and the operating system handle network communication.
The core problem is that memcached, by default, uses fixed-size buffers for its network connections. When your application sends many requests, or large amounts of data, these buffers can fill up. If the OS can’t clear them fast enough, or if memcached itself is busy processing other requests, your new requests have to wait. This waiting is latency.
Think of it like a postal service. If the mailboxes (network buffers) are too small and the mail carriers (OS/memcached threads) can’t pick up mail fast enough, the mail starts piling up. Your letter, even if short, has to wait in line.
Memcached uses two main types of buffers:
- Receive Buffer: For incoming requests from clients.
- Send Buffer: For outgoing responses back to clients.
The OS also has its own send and receive buffers for each TCP connection. When memcached’s internal buffers are full, it can’t accept new requests. When the OS’s send buffers are full, memcached can’t send data out. Both scenarios cause delays.
The primary levers you have to tune this are:
-R <num>(max_request_size): This is the maximum size, in bytes, of a single incoming request that memcached will attempt to read. It’s not directly a buffer size, but it influences how much data memcached tries to process at once.--enable-large-memory: This is more of a compile-time option that allows memcached to use larger memory allocations, which can indirectly help with buffer management if the default system limits are hit.- OS-level TCP buffer tuning: This is often the most impactful. Parameters like
net.core.rmem_max,net.core.wmem_max,net.ipv4.tcp_rmem, andnet.ipv4.tcp_wmemon Linux control the kernel’s TCP socket buffer sizes.
Let’s dive into tuning.
If you’re experiencing high latency, especially when your application is sending many small requests or large values, you’ll want to look at increasing the buffer sizes.
Diagnosis:
First, monitor your system’s network performance. Tools like netstat -s can give you counts of dropped packets or buffer overflows, though they are often aggregated and hard to tie directly to memcached. A more direct approach is to observe memcached’s behavior under load. If you see memcached processes consuming a lot of memory and application latency spikes during high throughput, it’s a strong indicator.
Tuning max_request_size:
If your requests are consistently larger than the default (which is often around 1MB), you might need to increase this.
- Check: Examine your application’s data payloads. Are you sending large serialized objects or blob data?
- Command: Start memcached with a larger
max_request_size.memcached -p 11211 -m 64 -R 4194304 # Set max_request_size to 4MB - Why it works: This allows memcached to accept larger individual requests without immediately rejecting them or splitting them, potentially reducing overhead for large data operations.
Tuning OS TCP Buffers (Linux):
This is where you’ll likely see the most significant gains for high-throughput, low-latency scenarios.
-
Check: Look at your current kernel buffer settings.
sysctl net.core.rmem_max sysctl net.core.wmem_max sysctl net.ipv4.tcp_rmem sysctl net.ipv4.tcp_wmemDefault values might be around
212992forrmem_max/wmem_maxand4096 87380 6291456fortcp_rmem/tcp_wmem. Thetcp_rmemandtcp_wmemvalues aremin,default,max. -
Diagnosis: If
netstat -sshows a high number of "receive failed" or "send failed" messages, or if your application latency correlates with network saturation, the OS buffers are likely too small. -
Command (Temporary - for testing): Increase the maximum send and receive buffer sizes.
sysctl -w net.core.rmem_max=16777216 # 16MB sysctl -w net.core.wmem_max=16777216 # 16MB sysctl -w net.ipv4.tcp_rmem="4096 87380 16777216" sysctl -w net.ipv4.tcp_wmem="4096 65536 16777216"Note: The
tcp_rmem/tcp_wmemare min, default, max. We’re increasing the max values here. -
Command (Permanent): Add these lines to
/etc/sysctl.confand runsysctl -p. -
Why it works: Larger OS buffers allow the kernel to hold more data for each TCP connection before it has to drop packets or signal back pressure to the application. For memcached, this means it can accept more incoming data and send out more outgoing data in larger chunks, reducing the frequency of system calls and context switches, and allowing TCP’s congestion control to operate more effectively for higher throughput without sacrificing latency. The
tcp_rmem/tcp_wmemtuple allows the kernel to dynamically adjust the buffer size between the min and max values based on network conditions, but setting a higher max allows for more data to be buffered when needed. -
--enable-large-memory: While not a direct buffer size setting, compiling memcached with this option can sometimes help when dealing with very large memory allocations that might be affected by standard system memory limits, indirectly aiding buffer management. -
Check Compilation:
./configure --enable-large-memory make sudo make install
The key takeaway is that memcached is a high-performance, low-level network server. Its performance is tightly coupled with the operating system’s networking stack. Tuning memcached’s internal max_request_size is useful for specific large-payload scenarios, but for general low-latency, high-throughput, the OS TCP buffer tuning is critical.
After tuning these buffers, you might find that your application server’s CPU usage decreases, as it spends less time waiting for network I/O. You might also notice a reduction in the variability of your response times.
The next thing you’ll likely encounter is tuning the number of worker threads memcached uses via the -t flag, which directly impacts its ability to process requests concurrently and keep those buffers drained.