GKE node upgrades aren’t a single event; they’re a rolling process that happens node by node to minimize disruption, and Maintenance Windows are how you tell GKE when it’s okay to do that rolling upgrade.
Let’s see it in action. Imagine you have a critical application running on GKE, and you absolutely cannot have it interrupted during peak business hours. You’d define a Maintenance Window like this:
apiVersion: container.cnrm.cloud.google.com/v1beta1
kind: ContainerCluster
metadata:
name: my-gke-cluster
spec:
location: us-central1-a
releaseChannel:
channel: REGULAR
maintenancePolicy:
maintenanceWindow:
startTime: "2023-10-27T02:00:00Z"
endTime: "2023-10-27T05:00:00Z"
recurrence: "FREQ=WEEKLY;BYDAY=SA"
This configuration tells GKE: "For the my-gke-cluster in us-central1-a, on every Saturday, start any scheduled maintenance (like node upgrades) no earlier than 2 AM UTC and finish by 5 AM UTC. During this three-hour window, GKE will proactively upgrade your nodes, one at a time, ensuring your application remains available."
The problem this solves is simple: GKE needs to keep your nodes updated for security and feature reasons, but you can’t have it doing that randomly at 3 PM on a Tuesday. Maintenance Windows give you control over the when.
Internally, GKE’s control plane monitors your cluster’s health and its scheduled maintenance events. When a maintenance event is due, and a Maintenance Window is active, GKE initiates a rolling upgrade. It drains one node (evicting pods gracefully), upgrades it, and then brings it back online before moving to the next. This is the "rolling" part – it’s not an all-or-nothing event. The Maintenance Window dictates the period during which this rolling process can occur.
The levers you control are startTime, endTime, and recurrence. startTime and endTime define the daily window duration. recurrence uses the iCalendar (RFC 5545) format to specify how often this window repeats. You can schedule daily, weekly, monthly, or even more complex patterns. For instance, FREQ=DAILY;INTERVAL=2 would mean every other day.
A common misconception is that Maintenance Windows force an upgrade to start or finish within the window. That’s not quite right. GKE will attempt to perform maintenance within the window, but it doesn’t guarantee completion. If an upgrade starts at 4:30 AM UTC and takes longer than 30 minutes, it will extend beyond the 5 AM UTC endTime. This is why it’s crucial to set windows long enough to accommodate the upgrade of your node pool’s size.
The most surprising true thing about Maintenance Windows is that they don’t only apply to node upgrades. They are also the designated time for GKE to perform control plane upgrades, and potentially other system-level updates that might briefly impact cluster operations. This means even if you have no automatic node upgrades configured, your cluster might still undergo maintenance during these windows.
If you’ve set up your Maintenance Windows correctly, the next thing you’ll likely be thinking about is how to manage other critical application-specific maintenance during these periods.