Flow tables in nftables are a game-changer for high-performance packet processing because they allow the data plane (often hardware, like network interface cards or dedicated ASICs) to handle packet filtering and forwarding decisions directly, bypassing the CPU for frequently seen traffic patterns.
Let’s see this in action. Imagine a simple nftables rule to accept SSH traffic from a specific IP address:
# First, create a set for the allowed IP
nft add set ip filter allowed_ssh_ips { type ipv4_addr\; }
nft add element ip filter allowed_ssh_ips { 192.168.1.100\; }
# Then, create a table and chain
nft add table ip filter
nft add chain ip filter input { type filter hook input priority 0\; }
# Finally, the rule to accept SSH from the set
nft add rule ip filter input ip saddr @allowed_ssh_ips tcp dport 22 accept
Without flow tables, every packet matching this rule would traverse the kernel’s network stack, hit the nftables subsystem, be checked against the rule, and then potentially be accepted. This involves significant CPU overhead.
With nftables flow tables, the first packet from 192.168.1.100 destined for port 22 would be processed by the CPU. nftables, recognizing this as a pattern that can be offloaded, would instruct the hardware (if capable) to create a "flow entry" or "session entry." This entry essentially says: "Any packet from 192.168.1.100 to port 22 should be accepted." Subsequent packets matching this exact flow are then handled entirely by the hardware, bypassing the CPU almost entirely. The CPU is only involved if the flow needs to be established, modified, or if a packet doesn’t match an existing offloaded flow.
The core problem nftables flow tables solve is the CPU bottleneck in network filtering and forwarding, especially in high-throughput environments like routers, firewalls, or network appliances. Traditional software-based packet filtering, while flexible, consumes significant CPU cycles for every packet, making it difficult to scale to line-rate speeds. Flow tables offload the common, repetitive decisions to specialized hardware, freeing up the CPU for more complex tasks, connection tracking, or other application-level processing.
Internally, when nftables detects a rule that can be offloaded (typically simple accept/drop/forward rules with specific source/destination IPs, ports, and protocols), it communicates with the network driver or NIC firmware. This communication uses specific netlink messages to program the hardware’s flow table. The hardware then maintains a cache of these flow entries. When a packet arrives, the hardware first checks its flow table. If a match is found, the action (accept, drop, etc.) is performed immediately in hardware. If no match is found, the packet is punted to the CPU for software processing, and if that processing results in a new, offloadable flow, the hardware table is updated.
The key levers you control are primarily the rules themselves and the priority of the chains. Rules that are simple and frequently hit are prime candidates for offload. For example, a rule like ip daddr 1.2.3.4 accept is far more likely to be offloaded than a complex rule involving multiple packet header fields and conditional logic. The priority of the hook is also important; higher priority hooks (lower numerical values) are processed earlier, and if a flow can be offloaded at a high priority, it prevents subsequent processing in lower priority chains. You can also influence offloadability by using efficient set lookups (@set) rather than iterating through large lists within a rule.
What most people don’t realize is that the decision to offload isn’t solely based on nftables wanting to; it’s a negotiation. The hardware must explicitly support the specific instruction set and the type of flow entry required for a given rule. If the hardware doesn’t have a matching capability, the rule will remain software-processed, even if it looks offloadable. This means the actual performance gains are dependent on both the nftables configuration and the NIC’s capabilities, often exposed through driver parameters or specific kernel modules.
The next concept you’ll likely encounter is how to monitor and debug which rules are actually being offloaded to hardware, and how to tune your rules for maximum offload efficiency.