You can slash your Google Cloud Compute Engine bills by up to 91% by using Spot VMs, but you’re essentially renting compute that Google might reclaim at any moment.

Let’s see this in action. Imagine you have a batch processing job that needs a lot of CPU for a few hours, but it can be interrupted and restarted without losing significant work.

# Create a Spot VM instance
gcloud compute instances create batch-processor \
  --zone=us-central1-a \
  --machine-type=n1-standard-8 \
  --image-project=debian-cloud \
  --image-family=debian-11 \
  --spot \
  --boot-disk-size=100GB \
  --tags=batch-job

This command provisions an n1-standard-8 VM in us-central1-a. The --spot flag is the key here. It tells Google Cloud that this instance is eligible for Spot pricing. The --tags are useful for managing these instances later, perhaps for firewall rules or for your batch job orchestrator.

Now, what is this "reclaim" we’re talking about? Google Cloud uses Spot VMs to manage its excess capacity. If Google needs that capacity back for a customer paying full price, it can reclaim your Spot VM. You get a notice, typically 30 seconds before it’s shut down.

The magic behind Spot VMs is that they offer the same underlying hardware and performance as regular VMs, but at a drastically reduced price. The catch is the preemption. Preemptible VMs are the older generation of this concept, and they have a maximum runtime of 24 hours before being automatically stopped. Spot VMs, while also preemptible, don’t have that 24-hour limit; they can run indefinitely until Google needs the capacity back. For most interruptible workloads, Spot VMs are the preferred choice due to their flexibility.

To manage preemption gracefully, your application needs to be designed for it. This means:

  1. Checkpointing: Regularly save the state of your work. If the VM is preempted, you can restart from the last saved checkpoint. For batch jobs, this might involve writing intermediate results to a persistent storage like Cloud Storage or a database.

  2. Graceful Shutdown Handling: Your application should listen for a specific metadata server event that signals an impending preemption. When this event occurs, the application should save its current state and exit cleanly.

You can check the preemption status of an instance by querying the instance metadata server. A common way to do this is using curl:

# From within the VM, check for preemption notice
curl "http://metadata.google.internal/computeMetadata/v1/instance/preempted?recursive=true" -H "Metadata-Flavor: Google"

If this command returns TRUE, your instance is scheduled for preemption. You can then trigger your application’s shutdown routine.

The cost savings are significant. A standard n1-standard-8 VM might cost around $0.37 per hour. A Spot VM of the same type can be as low as $0.03 per hour. That’s over a 90% reduction for the same compute power, provided your workload can handle interruptions.

The pricing for Spot VMs is dynamic, meaning it can fluctuate based on supply and demand. However, it’s always a significant discount compared to on-demand pricing. You can see the current estimated Spot VM prices in the Google Cloud Console or by using gcloud commands.

The primary use cases for Spot VMs are:

  • Batch processing: Jobs that can be broken into smaller chunks and restarted.
  • Data analytics and machine learning training: Long-running computations that can tolerate interruptions and resume from checkpoints.
  • CI/CD runners: Build and test jobs that are inherently stateless and can be easily restarted.
  • Web crawling and rendering: Tasks that don’t require immediate, continuous availability.

When you’re managing a fleet of these, using instance groups with autoscaling and a managed instance group (MIG) is crucial. You can configure a MIG to use Spot VMs, and it will handle replacing preempted instances automatically.

# Example of creating a managed instance group with Spot VMs
gcloud compute instance-groups managed create my-spot-mig \
  --template=my-spot-template \
  --size=3 \
  --zone=us-central1-a

In this example, my-spot-template would be a pre-configured instance template that specifies --spot for its VMs. The MIG will ensure you always have 3 instances running, replacing any that get preempted.

The actual price you pay for a Spot VM is not fixed; it’s determined by the Spot VM price at the time the instance is running, which can fluctuate. However, there’s a maximum price you’ll pay, which is the on-demand price for that VM. Your costs will never exceed the on-demand rate, but the goal is to pay much less.

When you delete a Spot VM, it’s just like deleting a regular VM. The underlying resources are released, and you stop incurring charges. If you want to ensure your Spot VM isn’t preempted, you can always convert it to an on-demand instance.

# Convert a Spot VM to an on-demand instance
gcloud compute instances stop batch-processor --zone=us-central1-a
gcloud compute instances update batch-processor \
  --zone=us-central1-a \
  --no-spot
gcloud compute instances start batch-processor --zone=us-central1-a

This sequence stops the instance, removes the --spot flag from its configuration, and then restarts it as a regular on-demand instance.

The most common pitfall is not designing your application for preemption, leading to lost work and frustration. If you’re using Spot VMs for a stateful application that cannot tolerate interruption, you’re setting yourself up for failure. The system doesn’t break here; your application logic is what’s incompatible with the fundamental nature of Spot VMs.

The next step after successfully implementing Spot VMs for interruptible workloads is to explore how to optimize your persistent workloads with Committed Use Discounts or Sustained Use Discounts, which offer different savings models for predictable, always-on instances.

Want structured learning?

Take the full Gcp course →