Automatically Provision Right-Sized GKE Nodes with NAP (2026)

Node Auto Provisioning (NAP) in Google Kubernetes Engine (GKE) doesn’t just add nodes when you’re out of capacity; it intelligently scales your cluster by adding precisely the right kind of nodes, matching your workload’s specific resource demands.

Let’s see NAP in action. Imagine you’ve got a deployment with a few pods that are pretty hungry for CPU but don’t need much memory.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: cpu-hogs
spec:
  replicas: 3
  selector:
    matchLabels:
      app: cpu-hogs
  template:
    metadata:
      labels:
        app: cpu-hogs
    spec:
      containers:
      - name: hogger
        image: ubuntu:latest # A simple image for demonstration
        command: ["/bin/sh", "-c", "while true; do true; done"] # Infinite loop to consume CPU
        resources:
          requests:
            cpu: "1"
          limits:
            cpu: "2"

When you apply this, GKE tries to schedule these pods. If your existing nodes can’t accommodate them (based on their requests), NAP kicks in. Instead of just adding a generic node, NAP analyzes the pod’s requests (1 CPU in this case) and limits (2 CPU) and considers the available machine types in your configured nodePools or machineTypes list. It will then provision a new node pool with machine types that have at least 1 CPU available, and ideally, are cost-effective for this specific demand.

NAP solves the common problem of over-provisioning. Developers often set generous resource requests to avoid "out of memory" or "out of CPU" errors, leading to nodes that are mostly idle, wasting money. NAP addresses this by dynamically matching node capacity to actual workload needs. It does this by:

Monitoring Pending Pods: GKE continuously watches for pods that are in a Pending state because no existing node can satisfy their resource requests.
Analyzing Resource Needs: For these pending pods, NAP inspects their CPU and memory requests and limits.
Evaluating Available Machine Types: NAP consults a predefined list of eligible machine types (either within existing node pools or a specified list of machineTypes) that can satisfy the pending pod’s requirements. It prioritizes machine types based on cost-efficiency and availability.
Provisioning New Nodes: Once a suitable machine type is identified, NAP triggers the creation of a new node pool (or adds nodes to an existing one if configured) with instances of that machine type. These new nodes are then immediately available to schedule the pending pods.

The key levers you control are:

max_total_instances: This is the absolute ceiling on the total number of nodes across all NAP-enabled node pools in your cluster. It prevents runaway scaling.
autoscaling.enabled on Node Pools: You enable NAP at the node pool level. A node pool that’s configured for autoscaling and has NAP enabled will be considered for new node provisioning.
node_locations: Specifying preferred locations for your nodes can influence NAP’s decisions, ensuring nodes are provisioned where you need them.
machine_types (for Autoprovisioning): When NAP is enabled without specifying a nodePool to scale, you can provide a list of machineTypes that NAP is allowed to provision. This is a powerful way to constrain NAP to specific instance families (e.g., e2-medium, n2-standard-2).

Consider a scenario where you have a workload that requires GPUs, but your existing node pools don’t have any. If a pod requests a GPU and no existing node can satisfy it, NAP, if configured to allow GPU machine types, will provision a new node pool with GPU-enabled instances, ensuring your specialized workloads get the hardware they need. This is managed by specifying GPU machine types in your machine_types list or by having node pools with GPUs already configured for autoscaling and NAP.

The magic happens because NAP doesn’t just look at the total resources needed by all pending pods; it evaluates each pending pod individually against the available machine types. This means a single, large pod requesting a lot of CPU might trigger the provisioning of a single, powerful node, while multiple smaller pods might collectively trigger the provisioning of several smaller, more cost-effective nodes. The system is designed to be granular and cost-aware, avoiding the common pitfall of over-allocating large, expensive machines when smaller ones would suffice.

The next thing you’ll likely encounter is optimizing the mix of machine types that NAP can choose from, balancing cost, performance, and availability for your diverse workloads.