The Linux kernel’s TCP retransmit mechanism is failing because the network stack is dropping skb (socket buffer) structures that are essential for tracking and retransmitting lost packets.

Cause 1: Buffer Bloat / Excessive Queuing

The most frequent culprit is aggressive queuing within the network stack, leading to buffers filling up and skbs being silently discarded. This often manifests as TCP Fast Retransmit or Time Wait state issues.

  • Diagnosis: Monitor network interface buffer usage and TCP buffer statistics.
    # Check interface statistics for dropped packets (look for 'dropped', 'overruns', 'errors')
    ip -s link show eth0
    
    # Check TCP buffer memory limits and usage
    sysctl net.ipv4.tcp_mem
    cat /proc/net/snmp | grep Tcp
    
    # Look for specific TCP retransmit counts
    netstat -s | grep -i retrans
    
  • Fix: Adjust TCP buffer sizes and potentially interface queue lengths.
    # Increase initial TCP send/receive buffer sizes (example values)
    sysctl -w net.core.rmem_max=16777216
    sysctl -w net.core.wmem_max=16777216
    sysctl -w net.ipv4.tcp_rmem='4096 87380 16777216'
    sysctl -w net.ipv4.tcp_wmem='4096 65536 16777216'
    
    # Adjust interface queue length (e.g., for eth0, using fq_codel for better fairness)
    tc qdisc del dev eth0 root
    tc qdisc add dev eth0 root fq_codel
    
  • Why it works: Increasing rmem_max and wmem_max allows the kernel to allocate larger buffers for TCP connections, reducing the likelihood of exceeding buffer limits. Using fq_codel as the queueing discipline helps manage packet queues more intelligently, preventing individual flows from monopolizing buffer space and causing discards.

Cause 2: Network Congestion on Router/Switch

The Linux machine might be fine, but an upstream router or switch is experiencing congestion, leading to packet drops that the kernel can’t recover from gracefully. The TCP retransmit logic gets confused because it expects to receive an ACK for a packet that was dropped much earlier in the network path.

  • Diagnosis: Examine network device logs and interface statistics on intermediate network hardware. Use ping with large packet sizes and mtr to identify packet loss points.
    # Ping with large packets and a small interval to stress the path
    ping -s 1472 -i 0.1 <destination_ip>
    
    # Use mtr to trace the path and identify loss
    mtr <destination_ip>
    
  • Fix: Address congestion on the network device. This might involve QoS configurations, increasing bandwidth, or offloading traffic. On the Linux side, you can try reducing the sending rate.
    # Limit the TCP send rate (example: limit to 50Mbps)
    # This requires a traffic control setup, e.g., using tc
    tc qdisc add dev eth0 root tbf rate 50mbit burst 32kbit latency 50ms
    
  • Why it works: By limiting the outgoing rate, you reduce the burden on potentially congested intermediate devices, decreasing the chance of them dropping packets. tbf (Token Bucket Filter) is a simple mechanism to cap bandwidth.

Cause 3: NIC Hardware Offload Issues

Modern Network Interface Cards (NICs) offload certain TCP/IP processing tasks from the CPU (like checksum calculation, segmentation, and reassembly). If these offloads are buggy or misconfigured, they can lead to corrupted skbs or incorrect packet handling, triggering retransmit storms.

  • Diagnosis: Disable NIC offloads one by one and observe if the retransmit errors subside.
    # List available offloads for an interface
    ethtool -k eth0
    
    # Disable specific offloads (example: TSO, GSO, checksums)
    ethtool -K eth0 tso off
    ethtool -K eth0 gso off
    ethtool -K eth0 rx off
    ethtool -K eth0 tx off
    ethtool -K eth0 sg off # Scatter/Gather
    
  • Fix: Disable the problematic offload feature.
    # Example: Disable Generic Segmentation Offload (GSO)
    ethtool -K eth0 gso off
    
  • Why it works: By forcing the CPU to handle these tasks, you bypass potential hardware bugs or driver issues in the NIC’s offload engine, ensuring that packet processing is handled correctly by the kernel’s software stack.

Cause 4: TCP Congestion Control Algorithm Misbehavior

The choice of TCP congestion control algorithm (e.g., Cubic, BBR, Reno) and its interaction with network conditions can lead to aggressive behavior that triggers excessive retransmits. Some algorithms might be too optimistic in certain lossy environments.

  • Diagnosis: Check the currently active congestion control algorithm and experiment with others.
    # View current default
    sysctl net.ipv4.tcp_congestion_control
    
    # View available algorithms
    ls /proc/sys/net/ipv4/tcp_congestion_control
    
    # Temporarily set for a specific connection (requires patching kernel or using netfilter, more complex)
    # For system-wide, change sysctl value:
    sysctl -w net.ipv4.tcp_congestion_control=reno
    
  • Fix: Switch to a more conservative or suitable congestion control algorithm.
    # Example: Switch to Reno for testing
    sysctl -w net.ipv4.tcp_congestion_control=reno
    
  • Why it works: Algorithms like Reno are less aggressive in probing for bandwidth and might be more stable in environments with higher latency or intermittent packet loss, reducing the likelihood of triggering spurious retransmits. BBR is designed to minimize buffer occupancy, which can also help.

Cause 5: Insufficient System Resources (CPU/Memory)

If the system is heavily loaded, the kernel might struggle to process network packets in a timely manner. This delay can cause TCP timers to expire prematurely, leading to unnecessary retransmissions even if packets haven’t actually been lost.

  • Diagnosis: Monitor CPU and memory utilization.
    # High CPU usage, especially in kernel threads related to network I/O
    top -H -p $(pidof ksoftirqd)
    
    # Check for memory pressure
    free -h
    vmstat 1
    
  • Fix: Reduce the load on the system or increase its resources. This might involve optimizing applications, adding more CPU cores, or increasing RAM.
    # Example: Increase the number of network softirq threads (if CPU-bound)
    # This value is typically dynamic but can be tuned. Check /proc/interrupts for network irq affinity.
    # For many cores, you might want to spread IRQs.
    # A more direct fix is to optimize applications or upgrade hardware.
    
  • Why it works: Ensuring the kernel has sufficient CPU cycles and memory to process incoming and outgoing packets promptly prevents delays that could otherwise be misinterpreted as packet loss by the TCP stack.

Cause 6: Kernel Bug or Driver Issue

In rare cases, a bug in the Linux kernel’s TCP/IP stack or a specific network driver can cause skb corruption or incorrect state management, leading to the call trace.

  • Diagnosis: Examine kernel logs (dmesg) for any suspicious messages related to network drivers or the TCP stack. Try updating the kernel and network driver.
    dmesg | grep -i -E 'tcp|skb|netdev|eth0'
    
  • Fix: Update the kernel and network driver to the latest stable versions.
    # Example: Update kernel (distribution specific)
    # sudo apt update && sudo apt upgrade linux-generic
    
    # Example: Update driver (if available from vendor)
    # Consult NIC vendor documentation for specific driver update procedures.
    
  • Why it works: Patches in newer kernel versions or driver updates often address known bugs and performance issues that could be responsible for the observed behavior.

You’ll likely hit a TCP: Peer discarded fast retransmit error next if the underlying packet loss isn’t fully resolved.

Want structured learning?

Take the full Linux & Systems Programming course →