Linux TCP/IP tuning is less about making the network "faster" and more about making it less slow by preventing the operating system from actively hindering the flow of data.

Let’s watch a tiny, real-world example. Imagine two machines, sender and receiver, talking over TCP.

On sender:

# Start a simple HTTP server serving a 1MB file
echo "HTTP/1.0 200 OK\r\nContent-Length: 1048576\r\n\r\n$(head -c 1048576 /dev/zero)" > http_response.txt
nc -l -p 8080 < http_response.txt &
sender_pid=$!

On receiver:

# Download the 1MB file
time nc sender 8080 > /dev/null

This will likely finish in milliseconds. Now, let’s simulate a moderately congested link with a bufferbloat-like scenario. We’ll use tc (traffic control) on the sender to introduce a queue with a delay.

On sender:

# Add a 50ms delay queue to the loopback interface
sudo tc qdisc add dev lo root netem delay 50ms

Now, run the download again on receiver:

# Download the 1MB file again
time nc sender 8080 > /dev/null

The download time will jump significantly, likely to around 100ms (50ms out, 50ms back for ACKs). This is the baseline of what happens when the network itself adds delay. But what if the Linux kernel adds its own delay, even on a clean link? That’s where tuning comes in.

The primary goal of TCP/IP tuning is to manage the kernel’s internal buffers and congestion control algorithms to efficiently utilize available bandwidth without causing excessive latency or packet loss. This involves adjusting parameters that govern how much data can be "in flight" before an acknowledgment is received (windowing), how the system reacts to packet loss, and how it prioritizes network traffic.

The main levers you have are within the /proc/sys/net/ipv4/ and /proc/sys/net/core/ directories.

Buffer Management:

  • net.core.rmem_max: The maximum receive buffer size for all sockets.
  • net.core.wmem_max: The maximum send buffer size for all sockets.
  • net.ipv4.tcp_rmem: A 3-element array: min default auto_max. Sets the default TCP receive buffer sizes.
  • net.ipv4.tcp_wmem: A 3-element array: min default auto_max. Sets the default TCP send buffer sizes.

These settings are crucial. If your application is sending or receiving data faster than the default kernel buffers can handle, the kernel will start dropping packets or introduce artificial delays as it waits for space.

Congestion Control:

  • net.ipv4.tcp_congestion_control: The algorithm used to manage congestion. Common options are reno (default, older), cubic (default on modern Linux, good general-purpose), and bbr (developed by Google, aims for higher throughput on lossy or high-latency links).
  • net.ipv4.tcp_slow_start_after_idle: Controls whether TCP resets its congestion window after a period of inactivity. Setting to 0 can improve throughput for servers with intermittent traffic.

Timers and Retransmissions:

  • net.ipv4.tcp_rto_min, net.ipv4.tcp_rto_max: Minimum and maximum retransmission timeout values.
  • net.ipv4.tcp_keepalive_time: How long to wait before sending a keepalive probe to a closed connection.

Let’s say you have a high-bandwidth, high-latency connection (e.g., a satellite link or a transcontinental fiber path) and your current throughput is poor. The bottleneck isn’t the link speed itself, but the TCP window size. The "Bandwidth-Delay Product" (BDP) is the maximum amount of data that can be in transit at any given time. BDP = Bandwidth * Round-Trip Time (RTT).

If your RTT is 100ms (0.1s) and your link is 1Gbps (10^9 bits/s), your BDP is 10^9 bits/s * 0.1s = 10^8 bits, or about 12.5 MB. If your TCP send buffer (net.ipv4.tcp_wmem) is set much lower than this, say 64KB, you can only have 64KB of data outstanding. Even though the link could carry 12.5MB, TCP will only send 64KB, wait for an ACK, send another 64KB, and so on. This is a massive underutilization.

To fix this, you’d increase net.ipv4.tcp_wmem to accommodate the BDP.

Diagnosis Command:

To see current values:

sysctl net.core.rmem_max net.core.wmem_max net.ipv4.tcp_rmem net.ipv4.tcp_wmem net.ipv4.tcp_congestion_control

Example Tuning for High-Bandwidth/High-Latency:

Let’s assume a BDP of 12.5 MB as calculated above.

  1. Increase Max Buffers:

    # Set max receive and send buffers to 16MB (16777216 bytes)
    sudo sysctl -w net.core.rmem_max=16777216
    sudo sysctl -w net.core.wmem_max=16777216
    

    Why: This allows the kernel to allocate larger buffers for individual TCP connections, up to this maximum.

  2. Tune TCP Buffers:

    # Set TCP receive buffer: min 4KB, default 128KB, auto_max 16MB
    sudo sysctl -w net.ipv4.tcp_rmem="4096 131072 16777216"
    # Set TCP send buffer: min 4KB, default 128KB, auto_max 16MB
    sudo sysctl -w net.ipv4.tcp_wmem="4096 131072 16777216"
    

    Why: The auto_max value here is critical. It tells TCP it can dynamically grow its send/receive windows up to 16MB to fill the BDP. The default value provides a reasonable starting point for most connections.

  3. Consider BBR Congestion Control: If you’re on a modern kernel (4.9+):

    # Check available algorithms
    cat /proc/sys/net/ipv4/tcp_available_congestion_control
    # If bbr is available, enable it
    sudo sysctl -w net.ipv4.tcp_congestion_control=bbr
    

    Why: BBR aims to achieve optimal throughput by directly measuring bandwidth and RTT, rather than relying solely on packet loss as a signal for congestion (like Reno/Cubic). It often performs better on networks with significant bufferbloat or packet loss.

Important Note: These are global settings. If you have many applications with different needs, you might need more sophisticated approaches like iptables mangle rules to mark traffic and tc filters to apply different queueing disciplines or buffer sizes per flow, or even SO_RCVBUF/SO_SNDBUF socket options in your application code.

The next thing you’ll likely encounter is tuning the network interface’s receive queue (net.core.netdev_max_backlog) and understanding how it interacts with the kernel’s input queue.

Want structured learning?

Take the full Linux & Systems Programming course →