Cloud SQL High Availability (HA) isn’t just about having a standby; it’s fundamentally about making your database resilient to a single zone failure by keeping a replicated, synchronous copy of your data ready to take over in milliseconds.

Let’s see this in action. Imagine you have a us-central1-a primary instance and a us-central1-b standby. A network partition hits us-central1-a.

# On your application server, attempting a query
SELECT * FROM users WHERE id = 1;
# This query will hang for a moment...

Then, Cloud SQL’s internal load balancer detects the primary is unreachable. It reroutes traffic to the standby, which is already updated to the last committed transaction.

# The *same* application server, *same* query, seconds later
SELECT * FROM users WHERE id = 1;
# This query succeeds, returning data from the standby.

The entire process, from detection to failover, is typically under 60 seconds, often much faster. The key here is synchronous replication; the primary instance waits for acknowledgement from the standby before committing any transaction. This ensures no data loss during failover.

The Problem Solved: Unplanned downtime due to infrastructure failures (like a zone outage) is a major risk for any application relying on a database. HA for Cloud SQL mitigates this by providing a near-zero RTO (Recovery Time Objective) and RPO (Recovery Point Objective).

How it Works Internally: When you enable HA, Cloud SQL provisions two instances: a primary and a standby. These reside in different zones within the same region. A private IP address is assigned to the HA configuration, which acts as the single point of connection for your applications. All writes to the primary are synchronously replicated to the standby. A health check mechanism continuously monitors the primary. If the primary becomes unresponsive, Cloud SQL automatically promotes the standby to become the new primary, updating the IP address mapping to point to the new primary.

Levers You Control:

  1. Region and Zones: You select the region, and Cloud SQL picks two distinct zones within that region for your primary and standby instances. You can’t choose specific zones; Cloud SQL handles this for you.
  2. Machine Type: The machine type (vCPU and memory) for the HA configuration applies to both the primary and standby instances. Ensure it’s sufficient for your workload.
  3. Storage: Storage type (SSD/HDD) and size are mirrored across both instances.
  4. Network: HA instances use a private IP address by default for internal connectivity. You can configure authorized networks to control external access.

The most surprising thing about Cloud SQL HA is how seamlessly it handles failover without requiring application-level intervention if you’re using the HA IP address. The connection string doesn’t change for your application; the IP address it connects to simply resolves to a new, active instance behind the scenes. This is achieved through Google’s internal DNS and load balancing infrastructure that manages the virtual IP for the HA configuration.

The next critical concept to understand is how to verify your HA setup is truly working as expected, beyond just enabling it.

Want structured learning?

Take the full Gcp course →