The most surprising thing about network latency is that for many applications, the biggest contributor isn’t the physical distance or the speed of light, but rather the number of hops and the processing time at each intermediary device.
Imagine you’re sending a letter. You could write it yourself, walk it to the mailbox, and it’s delivered. That’s low latency. Now imagine you have to send it to an office, have them re-type it, put it in a new envelope, send it to another office for approval, then finally to the recipient. Each stop adds time. Network traffic often works like this, with routers, firewalls, and load balancers acting as those intermediary offices.
Let’s see this in action. We’ll simulate a simple request to a web server.
# On your client machine
curl -o /dev/null -s -w "
Total time: %{time_total}s
Name lookup: %{time_namelookup}s
Connect: %{time_connect}s
App connect: %{time_appconnect}s
Pretransfer: %{time_pretransfer}s
Redirect: %{time_redirect}s
Start transfer: %{time_starttransfer}s
" https://example.com
Running this command will show you timings for different stages of a single HTTP request. time_total is the end-to-end time. time_connect is the time it took to establish the TCP connection. time_namelookup is DNS resolution. time_starttransfer is when the first byte of the response arrived after the request was sent. The difference between time_total and time_starttransfer is primarily the server’s processing time and the time for the response data to travel back. The difference between time_starttransfer and time_connect is the time spent sending the request and waiting for the first byte, which includes network hops and intermediate device processing.
The core problem network latency solves is getting data from point A to point B. But the how it does it is where the complexity lies. Data travels in packets. Each packet has a destination address. Routers look at this address and decide the next hop. This decision-making process, especially in large, complex networks, takes time. Firewalls inspect packets, load balancers distribute traffic, and each of these adds a small delay. When you have many such devices in a path, these small delays accumulate.
The levers you control are primarily related to the network path and the devices on it.
- Number of Hops: Each router, firewall, or NAT device in the path adds processing time. Minimizing these intermediaries is key.
- Congestion: If a link or device is overloaded, packets get queued, leading to increased latency.
- Physical Distance: While often not the primary culprit for most applications, the speed of light is still a factor. Fiber optic cables are faster than copper, but signals still travel at a significant fraction of light speed.
- Protocol Overhead: The amount of data besides your actual payload that needs to be sent (headers, acknowledgments) can add up. TCP’s handshake, for instance, adds round trips.
- Device Performance: The CPU and memory of routers, firewalls, and servers themselves impact how quickly they can process packets.
When diagnosing latency, traceroute (or mtr) is your best friend.
# On your client machine
traceroute example.com
This command shows you the path packets take to reach example.com and the latency to each hop. Look for sudden jumps in latency between hops. A hop that consistently shows a high RTT (Round Trip Time) is a potential bottleneck. If the latency is consistently high from the first hop, the issue might be on your local network or with your ISP. If it spikes midway, it’s likely a device or link in the transit path.
Reducing latency involves a multi-pronged approach. For applications sensitive to latency, consider:
- Proximity: Deploying servers closer to your users. This might involve Content Delivery Networks (CDNs) for static assets or geographically distributed application instances.
- Network Design: Flattening network topologies where possible, reducing the number of firewall hops, and ensuring sufficient bandwidth and processing power on core network devices.
- Protocol Optimization: Using more efficient protocols (like UDP for real-time applications where some packet loss is acceptable) or implementing techniques like TCP Fast Open.
- Connection Pooling: Reusing existing TCP connections to avoid the overhead of establishing new ones for each request.
- Caching: Storing frequently accessed data closer to the user or application to avoid fetching it over the network repeatedly.
One often-overlooked aspect of latency is the impact of buffer bloat on network devices. When a router’s buffers fill up due to congestion, instead of dropping packets immediately, they queue them. This queuing delay, especially on older or less sophisticated hardware, can add significant and variable latency. Even when the average bandwidth is sufficient, this buffering can make interactive applications feel sluggish because packets are being held for extended periods before being forwarded. This is why many modern routers implement Active Queue Management (AQM) algorithms like CoDel or FQ-CoDel, which are designed to keep buffer delays low.
The next problem you’ll likely encounter after optimizing for latency is dealing with packet loss, which often goes hand-in-hand with congestion.