GKE private clusters are a nightmare to get right the first time, especially when you’re trying to lock down your nodes and avoid public IP addresses.

The core issue is that your nodes need to talk to the GKE control plane (which lives on the internet) for Kubernetes API access, node registration, and essential operations. When you remove public IPs from your nodes, you break this communication path by default.

Here’s how to fix it, covering all the common gotchas:

1. The Control Plane Needs to Reach Your Nodes

Diagnosis: You’ll see node status NotReady or RegistrationFailed in kubectl get nodes. In the GKE UI, nodes might show Stuck in creating or Node error.

Cause: Your private nodes can’t reach the public GKE control plane endpoints.

Fix 1: Private Service Access (PSA) with Authorized Networks

This is the recommended and most secure approach. You’ll create a private connection from your VPC network to the Google APIs and services, including the GKE control plane.

  • Step 1: Enable Private Service Access. In your VPC network, create a private services access connection.

    gcloud compute addresses create google-managed-services-YOUR_VPC_NETWORK \
        --global \
        --purpose=VPC_PEERING \
        --prefix-length=20 \
        --description="Peering for Google APIs" \
        --network=YOUR_VPC_NETWORK
    

    Then, create the peering connection:

    gcloud services vpc-access-connector create YOUR_CONNECTOR_NAME \
        --network=YOUR_VPC_NETWORK \
        --region=YOUR_REGION \
        --range=google-managed-services-YOUR_VPC_NETWORK \
        --project=YOUR_PROJECT_ID
    

    This command allocates an IP range within your VPC that Google services will use to communicate with your VPC.

  • Step 2: Enable the GKE Control Plane Private Endpoint. When creating your GKE cluster, specify the --enable-private-endpoint flag. This gives your control plane a private IP address within your VPC.

    gcloud container clusters create YOUR_CLUSTER_NAME \
        --region=YOUR_REGION \
        --network=YOUR_VPC_NETWORK \
        --subnetwork=YOUR_SUBNETWORK \
        --enable-private-nodes \
        --enable-private-endpoint \
        --master-ipv4-cidr=172.16.0.0/28 \
        --project=YOUR_PROJECT_ID
    

    The --master-ipv4-cidr is a dedicated, internal IP range for the control plane.

  • Step 3: Configure Authorized Networks (Crucial for Private Endpoint). Even though the control plane has a private IP, it still needs to allow traffic from your nodes’ private IPs. You’ll use authorized networks on the control plane’s firewall rules.

    gcloud container clusters update YOUR_CLUSTER_NAME \
        --region=YOUR_REGION \
        --enable-master-authorized-networks \
        --master-authorized-networks=YOUR_NODE_SUBNET_CIDR=GKE_MASTER_AUTHORIZED_NETWORK_NAME \
        --project=YOUR_PROJECT_ID
    

    Replace YOUR_NODE_SUBNET_CIDR with the CIDR range of your GKE node subnet. This allows your nodes to reach the control plane’s private endpoint.

Why it works: Private Service Access routes Google API traffic, including GKE control plane communication, over your private VPC network. The --enable-private-endpoint flag gives the control plane a VPC-native IP, and authorized networks ensure your nodes can reach it.

Fix 2: Using a NAT Gateway (Less Secure, More Complex)

If PSA isn’t an option or you’re dealing with legacy setups, you can use a NAT gateway for your nodes to reach the public internet.

  • Step 1: Create a Cloud NAT Gateway. Configure Cloud NAT for your node subnet.

    gcloud compute routers create nat-router \
        --network=YOUR_VPC_NETWORK \
        --region=YOUR_REGION
    
    gcloud compute routers nats create YOUR_NAT_GATEWAY_NAME \
        --router=nat-router \
        --region=YOUR_REGION \
        --auto-allocate-nat-external-ips \
        --nat-custom-subnet-ip-ranges=YOUR_SUBNETWORK \
        --enable-logging
    

    This assigns external IP addresses to your nodes for outbound traffic.

  • Step 2: Ensure Control Plane Reachability (via Public IP). For this to work, your GKE cluster must not have --enable-private-endpoint set, meaning the control plane has a public IP. You’ll then need to allow traffic from your NAT gateway’s external IP(s) to the GKE control plane’s public IP. This is usually handled by default when nodes have public IPs, but with NAT, you need to explicitly allow the NAT IPs.

    gcloud container clusters update YOUR_CLUSTER_NAME \
        --region=YOUR_REGION \
        --enable-master-authorized-networks \
        --master-authorized-networks=YOUR_NAT_EXTERNAL_IP/32=NAT_GATEWAY_ACCESS \
        --project=YOUR_PROJECT_ID
    

    You’ll need to find the external IPs assigned to your NAT gateway and add them to the authorized networks.

Why it works: The NAT gateway provides your private nodes with a route to the public internet, allowing them to reach the GKE control plane’s public IP. Authorized networks then permit this specific NAT IP to communicate with the control plane.

2. Your Nodes Need to Reach the GKE Control Plane API

Diagnosis: Similar to above, NodeNotReady or RegistrationFailed, but specifically when you’re trying to upgrade nodes or apply certain cluster-wide configurations that require the nodes to actively call the API.

Cause: Firewall rules are blocking traffic from your nodes to the GKE control plane’s IP.

Fix: Add Firewall Rules for Control Plane Access

  • For Private Clusters with Private Endpoint (PSA): Ensure your VPC firewall rules allow egress traffic from your node subnet to the --master-ipv4-cidr you specified during cluster creation.

    gcloud compute firewall-rules create allow-gke-master-private-egress \
        --network=YOUR_VPC_NETWORK \
        --action=ALLOW \
        --direction=EGRESS \
        --rules=tcp:443,tcp:80,udp \
        --source-ranges=YOUR_NODE_SUBNET_CIDR \
        --destination-ranges=172.16.0.0/28 \
        --project=YOUR_PROJECT_ID
    

    Replace 172.16.0.0/28 with your actual --master-ipv4-cidr.

  • For Clusters with Public Control Plane IP (using NAT): Ensure your VPC firewall rules allow egress traffic from your node subnet to the public IP address range of the GKE control plane. Google publishes these ranges (e.g., 35.191.0.0/16, 130.211.0.0/22). You’ll need to add a rule like this:

    gcloud compute firewall-rules create allow-gke-master-public-egress \
        --network=YOUR_VPC_NETWORK \
        --action=ALLOW \
        --direction=EGRESS \
        --rules=tcp:443 \
        --source-ranges=YOUR_NODE_SUBNET_CIDR \
        --destination-ranges=35.191.0.0/16,130.211.0.0/22 \
        --project=YOUR_PROJECT_ID
    

    Check the official GKE documentation for the most up-to-date IP ranges.

Why it works: These firewall rules explicitly permit the necessary outbound connections from your nodes to the GKE control plane, allowing them to register, receive commands, and report status.

3. Your Nodes Need to Pull Container Images

Diagnosis: Pods stuck in ImagePullBackOff or ErrImagePull.

Cause: Your nodes cannot reach public container registries like gcr.io or docker.io.

Fix 1: Private Google Access (for GCR/Artifact Registry)

Ensure Private Google Access is enabled for your subnet. This allows VMs in your VPC to reach Google APIs and services (including GCR/Artifact Registry) using internal IP addresses.

  • Step 1: Enable Private Google Access on the subnet.
    gcloud compute networks subnets update YOUR_SUBNETWORK \
        --region=YOUR_REGION \
        --enable-private-ip-google-access \
        --project=YOUR_PROJECT_ID
    
    This ensures that traffic destined for private.googleapis.com or restricted.googleapis.com is routed internally.

Fix 2: NAT Gateway (for other registries like Docker Hub)

If you are pulling images from registries other than Google’s (like docker.io), you’ll need outbound internet access.

  • Step 1: Configure Cloud NAT (as in Fix 2 for Control Plane Reachability). Ensure your node subnet is covered by a Cloud NAT gateway with external IP addresses.

Why it works: Private Google Access routes traffic to Google’s container registries through Google’s internal network, bypassing the public internet. A NAT gateway provides outbound internet access for other registries.

4. GKE Control Plane Needs to Reach Your Nodes (for Node Taints/Tolerations, etc.)

Diagnosis: This is less common for initial setup but can manifest during rolling updates or specific GKE operations. You might see errors related to GKE trying to manage nodes and failing.

Cause: The GKE control plane’s private endpoint cannot reach your nodes.

Fix: Configure Authorized Networks on the Control Plane

This is a critical step when using --enable-private-endpoint.

  • Step 1: Add Node Subnet CIDR to Authorized Networks. You must add the CIDR range of your GKE node subnet to the control plane’s authorized networks.
    gcloud container clusters update YOUR_CLUSTER_NAME \
        --region=YOUR_REGION \
        --enable-master-authorized-networks \
        --master-authorized-networks=YOUR_NODE_SUBNET_CIDR=GKE_NODE_SUBNET_ACCESS \
        --project=YOUR_PROJECT_ID
    
    Replace YOUR_NODE_SUBNET_CIDR with the actual CIDR of the subnet where your nodes reside. This allows the GKE control plane (which has a private IP in this scenario) to initiate connections back to your nodes.

Why it works: This explicitly whitelists your node’s IP range, allowing the GKE control plane to communicate with your nodes for essential management tasks, even though the control plane itself is only accessible via its private IP.

The next error you’ll hit if your node pools are configured with Ephemeral OS images will be related to nodes not being able to provision persistent disks due to the lack of public egress if you haven’t configured NAT or Private Google Access for storage.

Want structured learning?

Take the full Gke course →