Longhorn’s magic is that it makes your Kubernetes nodes act like a giant, resilient hard drive, even if those nodes are scattered across different machines or even different data centers.
Let’s see Longhorn in action. Imagine you have a K3s cluster. You’ve installed Longhorn, and now you want to create a new Persistent Volume (PV) for your application.
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: my-app-pvc
spec:
accessModes:
- ReadWriteOnce
storage:
storageClassName: longhorn
resources:
requests:
storage: 10Gi
When K3s processes this PersistentVolumeClaim, it tells Longhorn, "Hey, I need 10Gi of storage that can be mounted by a single pod." Longhorn then gets to work. It doesn’t just carve out space on one server. Instead, it might create three replicas of that 10Gi volume, spread across three different worker nodes in your K3s cluster. Each replica is a full, independent copy.
# After applying the PVC, you'd see something like this on a worker node
# (This is a simplified view, actual device paths can vary)
ls -l /dev/longhorn/
total 8
lrwxrwxrwx 1 root root 12 Aug 23 10:00 pvc-abcdef12-3456-7890-abcd-ef1234567890-rep1 -> ../dm-1
lrwxrwxrwx 1 root root 12 Aug 23 10:00 pvc-abcdef12-3456-7890-abcd-ef1234567890-rep2 -> ../dm-2
lrwxrwxrwx 1 root root 12 Aug 23 10:00 pvc-abcdef12-3456-7890-abcd-ef1234567890-rep3 -> ../dm-3
Your application pod then mounts one of these replicas. If the node hosting that replica dies, Longhorn automatically promotes another replica to be the primary and the pod seamlessly reconnects to the new primary. Your application never even notices.
The problem Longhorn solves is the inherent fragility of traditional storage in a distributed system. If your application data lives on a single disk attached to a single node, and that node fails, your data is gone, and your application is down. Longhorn provides a distributed, fault-tolerant layer that abstracts away the underlying node-specific storage. It allows you to treat your cluster’s local disks as a unified, highly available storage pool.
Here’s how it works internally:
- Volume Creation: When you request storage, Longhorn creates a logical volume.
- Replication: It then creates multiple replicas (copies) of this volume. By default, it’s usually 3 replicas. These replicas are distributed across different nodes to ensure redundancy.
- Engine/Replica Process: Each replica is managed by a
longhorn-engineprocess on the node where the replica resides. This process handles I/O for that specific replica. - Manager Component: A
longhorn-managercomponent (running as a Deployment in your K3s cluster) orchestrates everything. It tracks the health of replicas, decides which replica is the primary for a given volume, handles failovers, and manages the creation and deletion of volumes and replicas. - Kubernetes Integration: Longhorn uses Kubernetes Custom Resource Definitions (CRDs) like
Volume,Replica, andNodeto store its state and communicate with the K3s API server. When a pod requests a PV ofstorageClassName: longhorn, the K3s CSI (Container Storage Interface) driver for Longhorn is invoked. This driver talks to thelonghorn-managerto provision the actual Longhorn volume and its replicas. - Mounting: The CSI driver then instructs the K3s node to mount the primary replica of the Longhorn volume to the pod. This is done using standard block device mapping (often via
dm-thinor similar on Linux) and then formatted with a filesystem.
The exact mechanism by which Longhorn ensures consistency across replicas during writes involves a form of synchronous replication for critical operations, followed by asynchronous updates for less critical ones. When a write operation is requested by the application, the primary replica receives it, then it’s propagated to the secondary replicas. The write is only acknowledged back to the application once a quorum of replicas (typically 2 out of 3) have confirmed receipt. This ensures that even if a node fails mid-write, the data is not lost because at least one other replica has the updated data.
The most surprising thing about Longhorn’s failover is how it handles a node disappearing: it doesn’t immediately try to promote a new replica. Instead, it waits for a configurable timeout (defaulting to 30 seconds). This grace period is crucial because network glitches can be transient. If the node reappears within that window, Longhorn simply resynchronizes the replica and avoids unnecessary failover operations, which can be disruptive. If the node stays down past the timeout, then it initiates the promotion of a new primary from the remaining healthy replicas, ensuring continuous availability.
Once you have Longhorn volumes working, the next logical step is to explore its advanced features like volume snapshots and backups, which are critical for disaster recovery and data protection strategies.