Load Balancing + Auto Scaling: Scale Behind a Load Balancer (2026)

The magic of auto-scaling isn’t just about adding more servers; it’s about making them invisible to your users.

Let’s say you have a web application. Users hit a single DNS name, like myapp.example.com. This name doesn’t point directly to your servers. It points to a load balancer. The load balancer’s job is to distribute incoming traffic across a pool of your application servers.

Here’s what that looks like in practice. We’ll use AWS as an example, but the concepts are universal.

First, we need an Auto Scaling Group (ASG). This is the brain of our scaling operation. It defines the minimum and maximum number of servers we want, and critically, what kind of servers to launch.

{
  "AutoScalingGroupName": "my-app-asg",
  "LaunchConfigurationName": "my-app-launch-config",
  "MinSize": 2,
  "MaxSize": 10,
  "DesiredCapacity": 2,
  "AvailabilityZones": ["us-east-1a", "us-east-1b", "us-east-1c"],
  "HealthCheckGracePeriod": 300,
  "Tags": [
    {"Key": "Name", "Value": "MyAppServer", "PropagateAtLaunch": true}
  ]
}

Next, the LaunchConfigurationName points to a LaunchConfiguration. This tells the ASG how to build a new server when it needs one. This includes the EC2 instance type, AMI (Amazon Machine Image), security groups, and any user data scripts to run on boot.

{
  "LaunchConfigurationName": "my-app-launch-config",
  "ImageId": "ami-0abcdef1234567890", // Your application's AMI
  "InstanceType": "t3.medium",
  "SecurityGroups": ["sg-0123456789abcdef0"],
  "UserData": "#!/bin/bash\n# Install and start your web server here\napt-get update -y\napt-get install -y nginx\nsystemctl start nginx\n"
}

Now, the crucial piece for load balancing: the TargetGroup. This is a logical grouping of your application servers that the load balancer knows how to send traffic to. It also defines health checks.

{
  "TargetGroupName": "my-app-tg",
  "Protocol": "HTTP",
  "Port": 80,
  "VpcId": "vpc-0123456789abcdef0",
  "HealthCheckProtocol": "HTTP",
  "HealthCheckPath": "/",
  "HealthCheckIntervalSeconds": 30,
  "HealthCheckTimeoutSeconds": 5,
  "UnhealthyThresholdCount": 2,
  "HealthyThresholdCount": 3
}

The ASG needs to be told to register its instances with this TargetGroup. This is done by associating the ASG with a LoadBalancer and specifying the TargetGroup.

// When creating or updating the ASG, you'd specify this association.
// For AWS CLI, it's done via 'aws autoscaling attach-load-balancer-target-groups'
{
  "AutoScalingGroupName": "my-app-asg",
  "TargetGroupARN": "arn:aws:elasticloadbalancing:us-east-1:123456789012:targetgroup/my-app-tg/abcdef1234567890"
}

Finally, the Load Balancer itself (e.g., an Application Load Balancer or ALB). We configure it to listen on port 80 for incoming HTTP requests and forward them to our my-app-tg target group.

// ALB Listener Configuration
{
  "DefaultActions": [
    {
      "TargetGroupARN": "arn:aws:elasticloadbalancing:us-east-1:123456789012:targetgroup/my-app-tg/abcdef1234567890",
      "Type": "forward"
    }
  ],
  "Port": 80,
  "Protocol": "HTTP"
}

When traffic hits myapp.example.com, the DNS resolves to the ALB. The ALB checks which of the instances in my-app-tg are healthy (passing their health checks). It then picks one of those healthy instances and forwards the request. If an instance fails health checks, the ALB stops sending traffic to it, and the ASG will eventually terminate it and launch a replacement.

The ASG itself scales based on metrics. You’d set up scaling policies, for example: "If CPU utilization across the ASG averages over 70% for 5 minutes, add 2 instances." Or "If network traffic in exceeds 100MB/s, add 1 instance."

// Example Scaling Policy
{
  "AutoScalingGroupName": "my-app-asg",
  "PolicyName": "ScaleUpCPU",
  "PolicyType": "TargetTrackingScaling",
  "TargetTrackingConfiguration": {
    "TargetValue": 70.0, // Target average CPU utilization
    "PredefinedMetricSpecification": {
      "PredefinedMetricType": "ASGAverageCPUUtilization"
    }
  }
}

The ASG also has a DesiredCapacity. When the ALB registers a new instance from the ASG into its target group, the ASG’s DesiredCapacity should be maintained. If the ASG launches an instance, and the ALB adds it to the target group, the ASG knows that one of its "desired" instances is now ready to serve traffic. If an instance is terminated (either by ASG scale-in or ASG health replacement), and it’s removed from the ALB’s target group, the ASG will launch another to bring the count back up to DesiredCapacity (or MinSize if scaling down).

A common pitfall is forgetting to associate the ASG with the target group. Without this, the ASG launches instances, but the load balancer never knows about them, and traffic never reaches them. The ASG might think it’s scaled out, but the load balancer’s target group remains empty, and users see errors.

The most surprising thing is how the load balancer and auto-scaling group work in tandem: the ASG creates the servers, and the load balancer discovers them and directs traffic. They are distinct but tightly coupled. The ASG doesn’t inherently know about the load balancer’s health checks, and the load balancer doesn’t inherently know about the ASG’s scaling policies. The association is the bridge.

Consider a scenario where your application has a brief, intense spike in traffic. The ASG reacts to a CPU metric, launches two new instances. These instances boot, the application starts, and the ASG registers them with the ALB’s target group. The ALB, seeing these instances are healthy, starts sending them traffic. Once the spike subsides, the CPU metric drops, and the ASG scales back down, terminating instances. All this happens without manual intervention.

The one thing most people don’t realize is that the DesiredCapacity in the Auto Scaling Group isn’t just a static number; it’s the ASG’s target state. If the ASG is configured to scale up to MaxSize of 10, but its DesiredCapacity is set to 2, it will only ever aim for 2 instances unless a scaling policy explicitly changes the DesiredCapacity. The MinSize and MaxSize are boundaries, but DesiredCapacity is the immediate goal.

The next thing you’ll run into is configuring more advanced scaling policies, like schedule-based scaling for predictable traffic patterns or step scaling for more granular control.