Kubernetes networking is often described as a "black box," but the reality is that the Container Network Interface (CNI) is the most crucial component, dictating how pods communicate.
Let’s see Flannel in action, a foundational CNI that’s relatively simple to grasp. Imagine you have two nodes, node-1 and node-2, each with a pod:
node-1:
kubectl run --image=nginx --port=80 --name=nginx-1 -o wide
# Output will show nginx-1 running on node-1, e.g., 10.244.0.2
node-2:
kubectl run --image=httpd --port=80 --name=apache-1 -o wide
# Output will show apache-1 running on node-2, e.g., 10.244.1.3
Now, from nginx-1 on node-1, try to curl the apache-1 pod’s IP:
kubectl exec nginx-1 -- curl 10.244.1.3
If Flannel is configured, this curl will succeed. Flannel essentially creates a virtual network overlay across your nodes, allowing pods on different machines to communicate as if they were on the same local network. It achieves this by encapsulating network packets and sending them over the underlying physical network.
The core problem Kubernetes networking solves is providing a flat, routable network for all pods, regardless of which node they’re scheduled on. This is critical for service discovery and inter-pod communication without complex NAT or routing rules managed by the application itself. The CNI is the pluggable interface that allows different networking solutions to implement this pod networking.
Flannel, by default, uses a VXLAN backend. When nginx-1 sends a packet to 10.244.1.3, the Flannel agent on node-1 intercepts it. It sees that 10.244.1.3 belongs to a pod on node-2. It then wraps the original packet inside a VXLAN UDP packet, using the IP address of node-1 as the source and the IP address of node-2 as the destination. This VXLAN packet travels over the physical network to node-2. The Flannel agent on node-2 receives the UDP packet, strips off the VXLAN header, and delivers the original packet to apache-1.
The key levers you control with Flannel are primarily its backend (VXLAN, host-gw, UDP) and its subnet management.
-
Backend:
vxlanis the default and most common, creating an overlay network.host-gwis simpler and more performant but requires that the underlying network routes directly between pod subnets on different nodes. -
Subnet Manager: Flannel can use an etcd backend for subnet allocation or a file-based approach. For example, in a typical Flannel
ConfigMapfor VXLAN:apiVersion: v1 kind: ConfigMap metadata: name: kube-flannel-cfg namespace: kube-system data: cni-conf.json: | { "name": "flannel", "type": "flannel", "delegate": { "isDefaultOverlay": true, "runtime": { "cniVersion": "0.3.1" }, "network": "10.244.0.0/16", "backend": { "type": "vxlan", "vxlan": { "port": 8472 } } } } net-conf.json: | { "Network": "10.244.0.0/16", "Backend": { "Type": "vxlan", "VNI" : 1, "Port": 8472 } }Here,
network: "10.244.0.0/16"defines the pod IP address space, andbackend.type: "vxlan"specifies the encapsulation method.
Calico introduces a more sophisticated approach, often opting for a BGP-based routing model rather than an overlay. This means Calico can potentially offer better performance and deeper network visibility. When a pod on node-1 sends traffic to a pod on node-2, Calico, if configured with BGP, will advertise the pod CIDR of node-2 to the network. The physical network then routes the traffic directly between the nodes without an encapsulation layer. This requires your underlying network infrastructure to support BGP or for Calico to run its own BGP daemon (Bird) on each node to peer with your network.
Calico’s configuration is often more involved, especially when using BGP. You’ll typically configure IPPools and BGPConfiguration resources. For example, a basic IP pool:
apiVersion: projectcalico.org/v3
kind: IPPool
metadata:
name: default-ipv4-ippool
spec:
cidr: 10.244.0.0/16
ipipMode: Never
natOutgoing: true
nodeSelector: all()
This defines the CIDR for your pods. If you’re using BGP, you’d also have BGPConfiguration specifying your BGP peers. The natOutgoing: true setting is important; it means that traffic from pods destined for outside the cluster will be NATted using the node’s IP address.
While Flannel is simple and effective for many use cases, Calico offers more advanced features like network policy enforcement at a granular level and the option for direct routing, which can be more performant. Calico’s network policies are a first-class citizen, allowing you to define sophisticated ingress and egress rules for your pods, which is a significant advantage for security.
The one thing that trips many people up with Calico is its default behavior of not using an overlay when possible. If your network infrastructure doesn’t support direct routing between pod subnets on different nodes, Calico will fall back to using IP-in-IP (IPIP) encapsulation. This is often transparent, but it means you might not get the direct routing performance benefits without careful network planning or explicit configuration.
The next step in advanced Kubernetes networking is often exploring features like network policy enforcement or more performant data planes.