Blue-green deployments are a way to reduce downtime and risk when releasing new versions of your application.
Here’s a simplified haproxy configuration demonstrating the concept:
frontend http_front
bind *:80
default_backend http_back
backend http_back
balance roundrobin
# Blue backend - current production version
server blue1 192.168.1.10:8080 check
server blue2 192.168.1.11:8080 check
backend http_back_green
balance roundrobin
# Green backend - new version, currently idle
server green1 192.168.1.20:8080 check
server green2 192.168.1.21:8080 check
In this setup, http_front is listening on port 80 and directs all traffic to http_back by default. http_back currently serves traffic to the "blue" (production) servers. The http_back_green backend is configured but not actively used for incoming traffic.
To switch to the "green" version, you’d modify the haproxy.cfg to point the default_backend to http_back_green and then restart or reload haproxy.
frontend http_front
bind *:80
# Switch default backend to green
default_backend http_back_green
backend http_back
balance roundrobin
server blue1 192.168.1.10:8080 check
server blue2 192.168.1.11:8080 check
backend http_back_green
balance roundrobin
server green1 192.168.1.20:8080 check
server green2 192.168.1.21:8080 check
After reloading haproxy (e.g., sudo systemctl reload haproxy), all new incoming requests will be directed to the green environment. The blue environment remains untouched and can be used for rollback if needed.
The core problem blue-green deployments solve is the risk associated with updating a live system. Instead of performing an in-place update that has a brief moment of unavailability or potential for partial failure, you maintain two identical production environments. One environment, the "blue" environment, is running the current version of your application. The other, the "green" environment, is idle or receiving no live traffic.
When you’re ready to deploy a new version, you deploy it to the green environment. This allows you to test the new version thoroughly in a production-like setting without impacting live users. Once you’re confident the green environment is stable and functioning correctly, you redirect traffic from the blue environment to the green environment. This is typically done by updating a load balancer’s configuration.
The beauty of this approach is that the switchover is near-instantaneous. If any issues arise with the new green version, you can immediately switch traffic back to the blue environment, effectively rolling back the deployment with minimal disruption. The blue environment, now idle, can be updated with the next version or kept as a fallback.
The key components are:
- Two identical production environments: One active (blue), one idle (green).
- A traffic router: Usually a load balancer (like HAProxy, Nginx, or a cloud provider’s load balancer) that directs incoming requests to one of the environments.
- A deployment process: To deploy the new version to the idle environment.
Consider a scenario where your application has a database schema change. With a blue-green deployment, you’d deploy the new application code (which expects the new schema) to the green environment. You would then need to perform the database migration. The crucial part is coordinating this. You might migrate the database before switching traffic, or you might deploy a "dual-write" version of your application that can handle both old and new schema versions temporarily. Once the database is migrated and the green application is verified, you switch traffic.
The surprising thing about blue-green deployments is how often they are implemented solely at the load balancer level, ignoring the stateful components that might need careful handling. While redirecting HTTP traffic is straightforward, if your application relies on persistent connections, caches, or in-flight data, a simple load balancer switch might leave those in an inconsistent state. For instance, if you have long-running WebSocket connections in the blue environment, simply switching the load balancer won’t gracefully disconnect them. They will either timeout or continue to interact with the old version until they naturally expire, potentially causing user confusion or errors.
To truly master this, you need to consider how to gracefully drain connections from the blue environment before switching, or how to manage stateful services like databases and caches that might be shared or require coordinated updates alongside the application. This might involve custom scripts that signal the application to stop accepting new requests, complete in-flight operations, and then allow the load balancer to make the switch.
The next logical step is understanding how to automate this entire process using CI/CD pipelines and infrastructure-as-code tools.