This is happening because your load balancer is successfully receiving requests but is unable to establish a healthy connection with any of its configured backend servers, leading to upstream timeouts and 502 Bad Gateway errors.
Common Causes and Fixes
1. Backend Server Unreachable (Network)
- Diagnosis: From the load balancer’s perspective, can it ping or connect to the backend IP/port?
If this fails, the load balancer cannot even initiate a TCP connection.nc -vz <backend_ip> <backend_port> - Fix: Ensure firewall rules on the backend server (e.g.,
iptables,ufw, cloud provider security groups) explicitly allow inbound traffic from the load balancer’s IP address (or subnet) on the application port.
This works by allowing the fundamental network handshake between the load balancer and the backend.# Example for iptables on backend server sudo iptables -A INPUT -s <load_balancer_ip> -p tcp --dport <backend_port> -j ACCEPT sudo iptables -A INPUT -p tcp --dport <backend_port> -j DROP # Drop others if restrictive - Fix: Verify that the backend server’s network interface is active and has the correct IP address configured.
This ensures the server is actually listening on the network interface the load balancer is trying to reach.ip addr show
2. Backend Application Not Running or Listening
- Diagnosis: Is the application process actually running on the backend server and listening on the expected port?
You should see a process listening onsudo netstat -tulnp | grep <backend_port> # Or on newer systems sudo ss -tulnp | grep <backend_port>0.0.0.0:<backend_port>or<backend_ip>:<backend_port>. - Fix: Start or restart the application service. For example, if using systemd:
This ensures the application is active and ready to accept incoming connections.sudo systemctl restart <your_app_service_name>
3. Incorrect Backend Server IP/Port in Load Balancer Config
- Diagnosis: Double-check the
serverdirectives in your load balancer configuration file (e.g., Nginxnginx.confor HAProxyhaproxy.cfg).- Nginx:
# In http block or server block upstream backend_servers { server 192.168.1.100:8080; server 192.168.1.101:8080; } - HAProxy:
backend my_app server app1 192.168.1.100:8080 check server app2 192.168.1.101:8080 check
- Nginx:
- Fix: Correct any typos or outdated IP addresses/ports in the load balancer’s upstream or backend definitions. Reload the load balancer configuration.
This ensures the load balancer is trying to connect to the correct destinations.# Nginx sudo nginx -s reload # HAProxy sudo systemctl reload haproxy
4. Health Check Failures
- Diagnosis: Load balancers often perform health checks. If all backends fail their health checks, the load balancer will route no traffic. Check the load balancer’s status page or logs for health check details.
- Nginx (with
nginx-plus-module-vtsor similar): Check the VTS dashboard for upstream status. - HAProxy: Access the stats socket or port:
Look for backend servers in aecho "show stat" | sudo socat stdio /var/run/haproxy/admin.sockDOWNorN/Astate.
- Nginx (with
- Fix: Ensure the health check endpoint on the backend application is correctly configured, accessible, and returns a success code (e.g., HTTP 200 OK).
- Nginx:
location /healthz { access_log off; return 200 "OK"; } - HAProxy:
backend my_app option httpchk GET /healthz http-check expect status 200 server app1 192.168.1.100:8080 check
- Nginx:
5. Backend Application Crashing Under Load or Misbehaving
- Diagnosis: Check the logs of the backend application instances for errors, stack traces, or indications of resource exhaustion (e.g., out of memory, too many open files).
Look for sudden spikes in errors correlating with the 502s.sudo journalctl -u <your_app_service_name> -f # Or check specific log files: tail -f /var/log/your_app.log - Fix: Optimize the application code, increase server resources (CPU, RAM), or tune application-specific connection limits or thread pools. This addresses the root cause within the application that prevents it from responding to the load balancer’s health checks or application requests.
6. Load Balancer Resource Exhaustion
- Diagnosis: Monitor the load balancer’s CPU, memory, and network connections. High CPU or a large number of
CLOSE_WAITorESTABLISHEDconnections can indicate it’s overwhelmed.top htop netstat -an | grep ESTABLISHED | wc -l - Fix: Scale up the load balancer instance (more CPU/RAM) or scale out by adding more load balancer instances behind a DNS round-robin or another higher-level load balancer. This ensures the load balancer itself has the capacity to process requests and manage connections.
7. Incorrect proxy_pass or backend configuration (e.g., missing port, wrong protocol)
- Diagnosis: Review the load balancer’s configuration for the directives responsible for forwarding traffic.
- Nginx:
proxy_pass http://backend_servers;orproxy_pass http://192.168.1.100:8080; - HAProxy:
server app1 192.168.1.100:8080Ensure the protocol (http/https) and port match what the backend application is expecting. If the backend is listening on HTTPS, the load balancer needs to be configured for SSL termination or pass-through correctly.
- Nginx:
- Fix: Correct the
proxy_passorserverdirective to specify the correct protocol and port.# Example: If backend is on HTTPS proxy_pass https://backend_servers;
This ensures the load balancer speaks the correct "language" to the backend.# Example: If backend is on HTTPS server app1 192.168.1.100:8443 ssl verify none
Once these are resolved, you’ll likely encounter errors related to application-level issues, such as missing dependencies or incorrect request routing within your application.