TCP congestion control is the unsung hero of the internet, preventing a digital traffic jam by dynamically adjusting how much data you send based on network conditions.
Imagine sending a high-speed train full of packages across a country. If you send too many trains at once, the tracks get overloaded, trains crash, and packages get lost. TCP congestion control is like a sophisticated air traffic controller for these data trains, making sure they don’t overwhelm the network.
Let’s see it in action. We can observe TCP’s behavior using tools like ss (socket statistics) on Linux.
ss -ti 'sport = :80'
This command shows TCP socket information for connections on port 80 (typically HTTP). Look for lines like:
ESTAB 0 0 192.168.1.10:50000 172.217.160.142:80
...
rcv_ssthresh:10 snd_cwnd:10 rtt:20.5ms rttvar:2.5ms backoff:1
Here, snd_cwnd (send congestion window) is the key. It represents the amount of unacknowledged data TCP can have in flight. rcv_ssthresh is the slow start threshold. When snd_cwnd is small, TCP is being cautious. As data flows and acknowledgments (ACKs) arrive reliably, snd_cwnd grows, allowing more data. If packets are lost (indicated by retransmissions or timeouts), snd_cwnd and rcv_ssthresh are drastically reduced, and TCP enters a slower, more conservative sending phase. This is the core of congestion control: slow down when things look bad, speed up when they look good.
The fundamental problem TCP congestion control solves is the "tragedy of the commons" on packet-switched networks. Without it, every host would try to send as much data as fast as possible, leading to buffer bloat in routers, widespread packet loss, and a complete network meltdown. Congestion control ensures that the available network bandwidth is shared somewhat equitably and that the network remains usable.
Internally, TCP uses several algorithms to manage this. The most common ones are:
- Slow Start: When a connection begins, TCP starts with a small congestion window (typically 1-10 segments) and doubles it with each Round Trip Time (RTT) until it reaches the slow start threshold (
ssthresh). This is a rapid ramp-up. - Congestion Avoidance: Once
ssthreshis reached, TCP enters congestion avoidance. Here, the congestion window increases linearly (by about one segment per RTT) instead of exponentially. This is a more cautious growth. - Fast Retransmit: If a sender receives duplicate ACKs for the same segment, it infers that the next segment was lost and retransmits it immediately without waiting for a retransmission timeout.
- Fast Recovery: After a fast retransmit, TCP enters fast recovery. It reduces the congestion window and slow start threshold, then inflates the window to account for the retransmitted segment, allowing other segments to continue flowing.
The exact values for snd_cwnd and ssthresh are dynamic and depend on the specific TCP congestion control algorithm implemented by the operating system. Common algorithms include Reno, Cubic, and BBR. You can often see which algorithm is in use by checking /proc/sys/net/ipv4/tcp_congestion_control on Linux. For example, you might see cubic or bbr.
Let’s say you’re experiencing slow downloads. A common culprit is a router or endpoint with a small buffer, leading to packet loss even under moderate load. If your TCP connection is constantly entering fast retransmit and recovery, it means it’s hitting these small buffers.
To diagnose this, you can use ping with a large packet size and the "do not fragment" flag to see if you can even send large packets without them being dropped by intermediate routers.
ping -s 1400 -M do 8.8.8.8
If this fails, it suggests a network path with small MTU or a device dropping large packets. The fix often involves configuring your network devices to have larger buffers or, if you’re on a controlled network, adjusting the MTU. For a single host, there’s not much you can do directly other than hope the network path improves or that the remote server uses a more robust congestion control algorithm.
The specific mechanics of how TCP reacts to packet loss are crucial. When TCP detects packet loss (either via timeout or triple duplicate ACKs), it drastically reduces its sending rate. For timeouts, it typically halves the ssthresh and resets cwnd to 1 (entering slow start again). For triple duplicate ACKs (fast retransmit/recovery), it halves ssthresh and sets cwnd to ssthresh plus three segments (effectively entering congestion avoidance with a reduced window). This aggressive reduction is what prevents a cascade of losses and stabilizes the network.
What many people don’t realize is that the choice of congestion control algorithm can have a significant impact on performance, especially on high-bandwidth, high-latency networks (often called "long fat networks" or LFNs). Algorithms like Cubic are designed to be fairer and more aggressive in probing for bandwidth on such links compared to older algorithms like Reno. Newer algorithms like BBR (Bottleneck Bandwidth and Round-trip propagation time) aim to achieve higher throughput and lower latency by directly measuring bottleneck bandwidth and RTT, rather than relying solely on packet loss as a signal of congestion.
The next challenge you’ll encounter is understanding how TCP interacts with Quality of Service (QoS) mechanisms and how different transport protocols (like UDP) bypass congestion control altogether.