GKE can run untrusted code, like third-party binaries or multi-tenant applications, with a security boundary that’s much stronger than just Linux namespaces.
Here’s how it looks in action. Imagine you have a containerized web app that needs to process user-uploaded files. Normally, if that app had a vulnerability, an attacker could exploit it to gain access to the underlying GKE node and potentially other workloads running there.
apiVersion: apps/v1
kind: Deployment
metadata:
name: untrusted-app
spec:
replicas: 1
selector:
matchLabels:
app: untrusted-app
template:
metadata:
labels:
app: untrusted-app
spec:
containers:
- name: untrusted-app-container
image: gcr.io/google-samples/run-untrusted-workloads-with-gvisor:latest
securityContext:
privileged: false # Crucial: gVisor works by *not* being privileged
# This is where you tell GKE to use gVisor for this pod
# It's applied via a runtime class
runtimeClassName: gvisor
When you deploy this, the runtimeClassName: gvisor tells Kubernetes to use the gVisor runtime for this specific pod. GKE has a pre-configured gVisor runtime class that’s ready to go.
So, what is gVisor? At its core, it’s an application kernel for containers. Instead of letting your containerized application make direct system calls to the host Linux kernel, gVisor intercepts them. It then emulates the behavior of the Linux kernel within its own user-space environment.
Think of it like this: the host kernel is the landlord of a building. Normally, your apartment (container) can directly call the landlord for services (system calls like open(), read(), write()). If your apartment has a broken window (vulnerability), the landlord’s office (kernel) is exposed, and an attacker can potentially get into the landlord’s office and then into other apartments.
With gVisor, there’s a building manager (gVisor) between your apartment and the landlord. When you need something, you ask the building manager. The building manager then, very carefully, goes to the landlord and asks for it on your behalf, or tells you it can’t be done. This manager’s office is a lot smaller and more controlled than the landlord’s entire building. Even if your apartment has a broken window, the attacker only gets into the manager’s office, which has limited access to the rest of the building.
This isolation is achieved by gVisor’s two main components:
- Sentry: This is the core of gVisor. It’s a user-space kernel that intercepts and handles system calls from the application. It implements a significant portion of the Linux syscall interface but does so in a way that’s sandboxed. Sentry is written in Go and is designed to be robust and secure.
- Gofer: This is a file system daemon that runs alongside Sentry. It handles file system operations, allowing Sentry to interact with the host’s file system without directly exposing the host’s file system to the application. It acts as a proxy for file access.
The magic happens because gVisor doesn’t require any special kernel modules or modifications on the host. It leverages standard Linux features like namespaces and seccomp to create its sandbox, but the real isolation comes from Sentry’s syscall interception and emulation. This means you can enable gVisor on your GKE nodes without needing to reconfigure the entire cluster’s kernel.
The primary benefit is a significantly reduced attack surface. If an application inside a gVisor sandbox has a vulnerability that would normally allow it to escape to the host kernel, it instead hits the gVisor sandbox. The attacker is trapped within the emulated kernel, unable to access the host’s privileged operations or other pods. This makes it ideal for running untrusted code, such as user-submitted functions in a serverless platform, third-party plugins, or code from less trusted sources.
The one thing most people don’t realize is that gVisor isn’t a full Linux kernel emulator. It intentionally implements only a subset of syscalls that are necessary for common containerized applications. This is a deliberate security choice; by not implementing every obscure or complex syscall, gVisor reduces its own attack surface. If your application relies on a syscall that gVisor doesn’t support, it will simply fail, which is often the desired behavior for untrusted workloads. You can see the list of supported syscalls in the gVisor documentation.
The next step for securing your workloads might involve exploring Pod Security Policies or Pod Security Admission to enforce the use of runtimeClassName: gvisor for specific namespaces.