The Linux network stack is a surprisingly elegant piece of engineering that often gets treated as a black box, but its internal workings are fundamental to understanding network performance and debugging.
Let’s trace a simple send() call:
Imagine a Python script doing socket.sendall(data).
import socket
host = '192.168.1.100'
port = 8080
data = b"GET / HTTP/1.1\r\nHost: example.com\r\n\r\n"
with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s:
s.connect((host, port))
s.sendall(data)
When s.sendall(data) is called, the Python interpreter hands off the data to the underlying C library’s send() system call. This is the first boundary crossed: User Space to Kernel Space.
The kernel’s network stack then takes over. The send() system call lands in the sys_sendto or sys_sendmsg function in the kernel. This function, in turn, calls the socket layer’s send functions. The kernel now has the data, but it’s still associated with a struct sock (the kernel’s representation of a socket) and a specific protocol (TCP in this case).
The TCP layer is responsible for breaking the data into segments, adding sequence numbers, acknowledgment numbers, and other TCP header information. It also handles flow control and congestion control. If the TCP send buffer (controlled by net.core.wmem_max and net.ipv4.tcp_wmem) is full, the send() call will block until space becomes available.
Next, the IP layer. The TCP segment is now encapsulated within an IP packet. The IP layer adds the IP header, including source and destination IP addresses, TTL (Time To Live), and protocol information (TCP is protocol 6). This is where routing decisions start to happen, although the actual forwarding is usually done by the IP routing table (ip route show).
Finally, the link layer (e.g., Ethernet). The IP packet is wrapped in a link-layer frame. For Ethernet, this involves adding the Ethernet header with source and destination MAC addresses and the Ethernet trailer (FCS - Frame Check Sequence). The source MAC is the interface’s MAC, and the destination MAC is determined via ARP (Address Resolution Protocol) for local network destinations, or the MAC of the default gateway for remote destinations.
The netdev (network device) layer then takes this frame and hands it off to the specific network driver for the interface (e.g., eth0). The driver’s job is to put the frame onto the wire, often via a DMA (Direct Memory Access) engine.
This entire process, from user-space send() to the network interface card (NIC) transmitting bits, is managed by the kernel’s network stack.
Here’s what that looks like in action with strace:
# Run the Python script in the background
python your_script.py &
# Trace the process ID (replace <PID>)
strace -p <PID> -s 100 -e trace=sendmsg,write
You’ll see entries like:
sendmsg(3, {msg_name=NULL, msg_iov=[{"GET / HTTP/1.1\r\nHost: example.com\r\n\r\n", 38}], msg_control=NULL, msg_flags=0}, 0) = 38
This shows the sendmsg system call being made from user space with the data. The kernel then performs its magic.
The core problem this stack solves is abstracting the complexity of diverse network hardware and protocols. It provides a consistent API (sockets) to applications while handling the low-level details of framing, addressing, routing, and transmission.
The key components you control are:
- Sockets: The application-level endpoint. You choose
AF_INET(IPv4) orAF_INET6(IPv6),SOCK_STREAM(TCP) orSOCK_DGRAM(UDP). - TCP/IP Stack Parameters: Tunables in
/proc/sys/net/control buffer sizes, timeouts, congestion control algorithms, and more. For example,net.ipv4.tcp_rmemandnet.ipv4.tcp_wmemcontrol the TCP receive and send buffer sizes. - Routing Table:
ip route showdictates how packets are forwarded to different destinations. - Network Interface Configuration:
ip addr showandip link showmanage IP addresses, MAC addresses, and interface states.
A critical, often overlooked, detail is the role of the struct sk_buff (socket buffer) in the kernel. This is the fundamental data structure that represents a network packet as it traverses the stack. It contains pointers to the actual data, metadata about the packet (protocol, length, interface), and is passed between different layers of the network stack. Different layers add or strip headers by manipulating pointers and lengths within this single sk_buff structure, avoiding costly data copying.
After successfully sending a packet, the next immediate hurdle you’ll encounter is ensuring that incoming packets are correctly processed and delivered back to the application, often involving understanding the TCP connection state and buffer management on the receiving end.