Packet loss is usually a symptom, not the disease, and the real killer is often a saturated link or a misconfigured device that’s silently dropping traffic.

Let’s say you’re seeing intermittent application failures, slow performance, or even outright connection drops. Your first instinct might be to blame the application, but if you’ve ruled that out, it’s time to look at the network.

The most common culprit for packet loss is a saturated network interface. This happens when a device (router, switch, server NIC) is receiving or transmitting more data than its hardware or configured bandwidth can handle. When this happens, the device’s internal buffers fill up, and it has no choice but to start dropping incoming packets.

To diagnose this, you’ll want to check the interface statistics on your network devices. On Cisco IOS, for instance, you’d use show interface <interface_name>. Look for output like this:

GigabitEthernet0/1 is up, line protocol is up
  Hardware is Gigabit Ethernet, address is 001a.23ff.4567 (bia 001a.23ff.4567)
  Description: Uplink to Core Switch
  Internet address is 192.168.1.1/24
  MTU 1500 bytes, BW 1000000 Kbit/sec, DLY 10 usec,
     reliability 255/255, txload 180/255, rxload 200/255
  <...snip...>
  10457892 input packets, 1234567890 bytes, 0 no buffer
  Received 56789 broadcasts (0 multicasts)
  0 runts, 0 giants, 0 throttles, 0 parity
  0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored
  0 watchdog, 0 multicast, 0 pause input
  11567892 output packets, 1345678901 bytes, 0 underruns
  0 output errors, 0 collisions, 1 interface resets
  0 unknown protocol drops
  0 babbles, 0 late collision, 0 deferred
  0 lost carrier, 0 no carrier, 0 pause output
  0 output buffer failures, 0 output buffers collapsed, 0 output buffers swapped out

Pay close attention to txload and rxload. These are percentage values (out of 255) indicating how busy the interface is. If either is consistently above 200/255 (around 78%), you’ve likely found your problem. The input errors and output errors counters (including CRC, frame, overrun, underruns, collisions) are also critical indicators. A sudden spike in any of these suggests packets are being dropped due to congestion or physical layer issues.

The fix here is usually to increase the bandwidth of the link or reduce the traffic on it. If it’s a server NIC, you might upgrade to a 10GbE card. If it’s a switch port, ensure it’s connected to a higher-capacity uplink. For routers, you might need to implement Quality of Service (QoS) to prioritize critical traffic and shape less important traffic, or even upgrade the router’s processing power or link speeds. If the load is consistently high, it means the application or service sending that much traffic needs to be investigated.

Next up is a misconfigured Quality of Service (QoS) policy. Sometimes, instead of all traffic being dropped when an interface is saturated, QoS policies are designed to selectively drop lower-priority traffic. If your QoS configuration is too aggressive or misapplied, it might be dropping legitimate, high-priority traffic, making it seem like general packet loss.

On Cisco devices, you’d check your QoS configuration with show policy-map interface <interface_name>. Look for the packets dropped counters within the policy map. For example:

  Service-policy output: MyQoSMap

    Class-map: VOICE
      0 packets, 0 bytes
      Priority   (local exceed 1000000000)
        1000000000 bits/sec, 1000000000 packets/sec
        police:
          cir 1000000000, bc 1000000000
      Class-map: VIDEO
        50000 packets, 5000000000 bytes
        Weighted Fair Queueing
          (1000000000 bits/sec, 1000000000 packets/sec)
      Class-map: DATA
        1000000 packets, 100000000000 bytes
        Weighted Fair Queueing
          (2000000000 bits/sec, 2000000000 packets/sec)
      Class-map: CONTROL
        1000 packets, 1000000 bytes
        Weighted Fair Queueing
          (500000000 bits/sec, 500000000 packets/sec)
      Class-map: class-default
        5000 packets, 5000000 bytes
        Weighted Fair Queueing
          (1000000000 bits/sec, 1000000000 packets/sec)

If you see a large number of packets being dropped in a specific class, or even in class-default, it indicates that your QoS policy is actively policing and dropping traffic.

The fix is to tune your QoS policy. This might involve increasing the bandwidth allocated to certain classes, adjusting the police or shape rates, or ensuring that traffic is correctly classified. You might need to re-evaluate your bandwidth requirements and application priorities.

Another common cause is a faulty or overloaded network device. Routers and switches have internal components, including ASICs (Application-Specific Integrated Circuits) and memory, that can fail or become a bottleneck. If a device is overheating, has a failing power supply, or is simply running at its processing limit, it can start dropping packets.

Diagnosing this often involves looking at the device’s system logs (show logging on Cisco) for hardware errors, CPU utilization (show processes cpu sorted), and memory usage (show memory statistics). High CPU or memory utilization, especially if sustained, is a strong indicator. Also, check for any fan failures or environmental alerts.

The fix is usually to replace the faulty hardware or offload processing. If a router’s CPU is maxed out by routing protocols, you might need a more powerful model. If a switch is suffering from a faulty ASIC, it needs to be RMA’d. Sometimes, disabling unused features or simplifying the configuration can free up CPU cycles.

Physical layer issues can also manifest as packet loss. This includes bad Ethernet cables, failing SFP modules, or dirty fiber connectors. While these often cause more obvious link flapping or CRC errors, they can also lead to intermittent packet corruption and drops that are harder to trace.

Check the physical connections meticulously. Reseat cables, try known-good cables, and clean fiber connectors with appropriate cleaning tools. On Cisco devices, show interface <interface_name> transceiver detail can sometimes provide diagnostics for SFPs, and show controllers <interface_name> might offer lower-level hardware status. Look for CRC, input errors, frame errors, and giants on the show interface output, as these are often symptomatic of physical layer problems.

The fix is to replace suspect cables, SFPs, or clean connectors. It’s a process of elimination: swap out components until the errors disappear.

Finally, buffer bloat in network devices can cause significant latency and packet loss, especially under load. Older network devices, or those with small buffer sizes, can become overwhelmed when a fast link feeds a slower one. Packets queue up, and if the queue exceeds the buffer capacity, packets are dropped.

You can sometimes infer buffer bloat from high RTT (Round Trip Time) measurements during periods of high network utilization, even if interface utilization doesn’t appear critically saturated. Tools like mtr (My Traceroute) can help visualize latency increases along a path. The show interface <interface_name> output on Cisco devices shows input queue and output queue statistics, though these are often not directly visible and are managed internally.

The fix is often to implement proper QoS with ECN (Explicit Congestion Notification) or to upgrade to hardware with larger buffers and more sophisticated queue management algorithms. For end-user devices experiencing buffer bloat on their home routers, sometimes simply rebooting the router can temporarily clear the buffers.

If you’ve addressed all these, the next thing you’ll likely encounter is intermittent application timeouts due to retransmissions.

Want structured learning?

Take the full Computer Networking course →