Linux Namespaces and cgroups: Containers from Scratch (2026)

Linux namespaces and cgroups are the bedrock upon which containerization technologies like Docker and Kubernetes are built, but understanding them requires peeling back layers of abstraction and getting into the kernel’s gritty details.

Let’s see them in action. Imagine we’re about to create a very basic, single-process "container" that has its own isolated view of the network stack.

First, we need to create a new network namespace.

sudo unshare --net --mount-proc --pid --uts --ipc --fork /bin/bash

Inside this new shell (which is our "container"), ip addr show will look very different. You’ll see only a lo interface, and no external network interfaces like eth0 or wlan0. This is because the unshare --net command created a new network namespace, completely isolating our container’s network configuration. The --mount-proc ensures /proc is mounted in the new namespace, which is crucial for tools like ps to work correctly within the isolated environment. --pid isolates process IDs, --uts isolates hostname and domain name, and --ipc isolates inter-process communication mechanisms. --fork starts a new shell process within these namespaces.

Now, let’s look at cgroups. Cgroups (control groups) are about resource limiting and accounting. They allow you to allocate, restrict, and prioritize system resources (like CPU, memory, I/O, and network bandwidth) for a collection of processes.

Consider a simple scenario where we want to limit a process to using only a fraction of a CPU core. We’ll use the systemd-run command, which leverages cgroups behind the scenes.

# Create a transient service that runs a command with CPU limits
sudo systemd-run --scope --slice=my-limited-slice \
    --property=CPUQuota=50% \
    sleep infinity

This command starts a sleep infinity process within a new systemd slice called my-limited-slice. The --property=CPUQuota=50% tells systemd to configure the cgroup for this slice such that processes within it can use at most 50% of one CPU core. systemd-run --scope creates a transient scope unit, which is a temporary cgroup that lives as long as the process it manages.

If you were to run a CPU-intensive task within this scope (e.g., by attaching another process to it or by running a command that spawns many threads), it would be throttled once it hits that 50% CPU limit. You can inspect the cgroup settings directly in the /sys/fs/cgroup filesystem. For example, you’d find the configuration for my-limited-slice under /sys/fs/cgroup/cpu/user.slice/user-<UID>.slice/user-<UID>-my\\x2dltd\\x2dslc.slice/. The cpu.max file in the relevant cgroup directory would show something like 50000 100000, representing 50000 microseconds of CPU time allowed per 100000 microseconds (which is 50%).

Namespaces provide isolation, while cgroups provide resource control. Together, they enable the illusion of separate machines within a single host. Namespaces give processes their own view of system resources (like network interfaces, process trees, mount points), making them unaware of other processes running outside their namespace. Cgroups then cap how much of the actual host resources these isolated processes can consume.

The real magic is how these two mechanisms interact. When you run a container, the container runtime (like Docker) orchestrates the creation of new namespaces for the container’s processes and then places those processes into specific cgroups to manage their resource usage. For instance, a container will have its own network namespace (isolated eth0), its own PID namespace (its init process is PID 1), its own mount namespace (its own / root filesystem), and so on. Simultaneously, these processes are placed into cgroups that limit their CPU, memory, and I/O.

Here’s something most people don’t realize: the initial unshare command we used earlier, while creating a new network namespace, still has access to the host’s real network interfaces and routing tables if you don’t explicitly hide them. The isolation isn’t complete until you also configure the network interfaces within the new namespace. For example, you might need to bring up the loopback interface (ip link set lo up) and potentially create virtual ethernet pairs (veth) to connect the container’s network namespace to the host’s network stack, effectively creating a virtual network for the container. Without these steps, the namespace is created, but the network is still largely functional with the host, defeating much of the purpose.

The next step in building more robust containers is often understanding how to manage storage isolation using mount namespaces and overlay filesystems.