The most surprising thing about mounting cloud storage buckets as volumes in GKE is that you’re not actually "mounting" them in the traditional OS sense; you’re running a userspace filesystem driver that simulates a mount.
Let’s see this in action. Imagine you have a GKE cluster and a Google Cloud Storage (GCS) bucket named my-cool-data-bucket. You want to make the contents of this bucket available to a pod as if it were a local disk.
First, we need to enable the GKE CSI driver for GCS. This is usually done when creating or updating a cluster. If your cluster already exists, you can enable it via the gcloud command:
gcloud container clusters update my-gke-cluster \
--zone us-central1-a \
--update-addons GkeGcsCsiDriver=ENABLED
Now, let’s create a PersistentVolume (PV) and PersistentVolumeClaim (PVC) to represent our GCS bucket. The key here is the csi volume plugin and the volumeHandle which is your GCS bucket name.
# pv-gcs-bucket.yaml
apiVersion: v1
kind: PersistentVolume
metadata:
name: pv-gcs-data
spec:
capacity:
storage: 100Gi # This is a symbolic capacity, GCS is effectively infinite
volumeMode: Filesystem
accessModes:
- ReadWriteOnce # Or ReadOnlyMany, depending on your needs
persistentVolumeReclaimPolicy: Retain
csi:
driver: gcs.csi.storage.gke.io
volumeHandle: my-cool-data-bucket # Your GCS bucket name
volumeAttributes:
storageClass: standard # Or any other StorageClass if needed for other reasons
And the PVC to claim it:
# pvc-gcs-bucket.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: pvc-gcs-data
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 100Gi
volumeName: pv-gcs-data # Explicitly bind to the PV we created
storageClassName: "" # Important: set to empty string to use pre-provisioned PV
Finally, we define a pod that uses this PVC:
# pod-with-gcs.yaml
apiVersion: v1
kind: Pod
metadata:
name: my-app-with-gcs
spec:
containers:
- name: my-app
image: nginx:latest
ports:
- containerPort: 80
volumeMounts:
- name: gcs-data-volume
mountPath: /usr/share/nginx/html # Mount point inside the container
volumes:
- name: gcs-data-volume
persistentVolumeClaim:
claimName: pvc-gcs-data
When you apply these YAMLs (kubectl apply -f pv-gcs-bucket.yaml pvc-gcs-bucket.yaml pod-with-gcs.yaml), the GKE CSI driver for GCS kicks in. It deploys a "GCS Fuse CSI driver" pod (usually in a kube-system namespace) which runs gcsfuse in userspace. When your my-app-with-gcs pod starts and requests the gcs-data-volume, the CSI driver orchestrates a connection between your pod’s filesystem namespace and the userspace gcsfuse process. gcsfuse then translates standard filesystem operations (like read, write, ls) into GCS API calls. So, when Nginx tries to serve index.html from /usr/share/nginx/html/index.html, it’s actually a gcsfuse process making a storage.googleapis.com/storage/v1/objects/my-cool-data-bucket/index.html GET request.
This GCS Fuse CSI driver allows you to leverage GCS as a persistent volume for your GKE workloads. It’s particularly useful for:
- Shared data: Buckets can be accessed by multiple pods, though access modes (ReadWriteOnce vs. ReadOnlyMany) dictate concurrency.
- Large datasets: GCS offers virtually unlimited storage.
- Data ingress/egress: Easily ingest data into GCS and make it available to applications, or export processed data.
The problem this solves is the need for block-based storage (like Persistent Disks) for use cases where object storage is more natural and cost-effective, but you still need a filesystem interface for legacy applications or simpler integration.
The exact levers you control are primarily through the PersistentVolume and PersistentVolumeClaim definitions. The volumeHandle is your direct link to the bucket. accessModes are crucial; ReadWriteOnce is typical for a single pod’s exclusive access, while ReadOnlyMany allows multiple pods to read from the same bucket concurrently. For write access by multiple pods, you’d typically need application-level locking or use a different storage solution.
One thing that often trips people up is the performance characteristics. Because gcsfuse is a userspace filesystem, every operation involves context switches between kernel and userspace, and network latency to GCS. This means it’s not suitable for high-IOPS workloads, databases, or applications that require low-latency disk access. It excels at sequential reads/writes and scenarios where latency is less critical than capacity and cost. You might also encounter issues with file locking and atomic operations, as GCS itself is an object store and doesn’t natively support POSIX file locking semantics.
The next concept you’ll likely encounter is managing the lifecycle of data in these buckets, especially with regards to object versioning and lifecycle policies on GCS itself.