A circuit breaker doesn’t just prevent failures; it actively uses failures to protect your system from itself.
Let’s watch it in action. Imagine a user-service that calls an order-service. If order-service starts failing, user-service shouldn’t keep hammering it. That’s where a circuit breaker, often implemented with a library like opossum, steps in.
Here’s a simplified user-service making a call to order-service without a circuit breaker:
const axios = require('axios');
async function getUserOrders(userId) {
try {
const response = await axios.get(`http://order-service:3000/orders/${userId}`);
return response.data;
} catch (error) {
console.error(`Order service failed for user ${userId}:`, error.message);
// In a real app, we might return a default or an empty array.
// But the key is, we're *still trying* on the next request.
return [];
}
}
Now, let’s wrap that call with opossum:
const axios = require('axios');
const CircuitBreaker = require('opossum');
const orderServiceBreaker = new CircuitBreaker(async (userId) => {
// This is the function that might fail
const response = await axios.get(`http://order-service:3000/orders/${userId}`);
return response.data;
}, {
// Configuration options
errorThresholdPercentage: 50, // If 50% of calls fail, trip the breaker
resetTimeout: 30000, // After 30 seconds, try to reset
failureTimeout: 10000 // How long to wait before considering a call failed
});
orderServiceBreaker.on('open', () => {
console.warn('ORDER SERVICE CIRCUIT OPENED');
});
orderServiceBreaker.on('halfOpen', () => {
console.warn('ORDER SERVICE CIRCUIT HALF-OPEN');
});
orderServiceBreaker.on('close', () => {
console.warn('ORDER SERVICE CIRCUIT CLOSED');
});
orderServiceBreaker.on('reject', () => {
console.warn('ORDER SERVICE CALL REJECTED');
});
async function getUserOrders(userId) {
try {
// We now call the breaker, not directly the service
const orders = await orderServiceBreaker.fire(userId);
return orders;
} catch (error) {
console.error(`Order service call rejected for user ${userId}:`, error.message);
// If the breaker is open, 'fire' will throw immediately.
// We can return a fallback here.
return []; // Fallback for when the breaker is open
}
}
The user-service needs to call order-service to get user orders. If order-service is slow or down, user-service shouldn’t just keep retrying indefinitely. Doing so can overwhelm order-service even further, leading to a cascade where user-service itself becomes unresponsive due to the slow downstream dependency. The circuit breaker acts as a protective layer.
When order-service starts failing (e.g., network errors, timeouts, 5xx status codes), the circuit breaker monitors these failures. If a configurable percentage of calls (errorThresholdPercentage) fail within a certain window, the circuit breaker "trips" or "opens." Once open, any subsequent calls to order-service via the breaker are immediately rejected without even attempting the actual network request. This prevents user-service from wasting resources on calls that are guaranteed to fail and, crucially, gives order-service breathing room to recover.
After a timeout period (resetTimeout), the breaker enters a "half-open" state. In this state, it allows a single test call to order-service. If this test call succeeds, the breaker closes, and normal operation resumes. If it fails, the breaker immediately opens again, returning to the resetTimeout period. This automated recovery mechanism prevents manual intervention for transient failures.
The core problem this solves is unbounded resource consumption during downstream failures. Without a circuit breaker, a failing service can lead to thread exhaustion, connection pool depletion, and increased latency in the calling service, eventually causing it to fail as well. The circuit breaker explicitly limits the impact of a failing dependency, isolating the problem and providing a graceful degradation path (e.g., returning cached data or an empty response).
The configuration options are key:
errorThresholdPercentage: This is the percentage of failures that will cause the circuit to open. A value of50means if half the calls fail, it trips.resetTimeout: This is the duration, in milliseconds, that the circuit breaker will remain open before transitioning to thehalfOpenstate.30000means 30 seconds.failureTimeout: This is the maximum time, in milliseconds, that the circuit breaker will wait for a single operation to complete before considering it a failure.10000means 10 seconds.successThreshold: (Not shown in the example above, but common) The number of successful calls required in thehalfOpenstate to close the circuit.
The opossum library, like many circuit breaker implementations, exposes events like open, halfOpen, close, and reject. These are invaluable for monitoring and alerting. You can hook into these events to trigger alerts when a service is struggling or when the breaker has recovered.
The most surprising thing about circuit breakers is how they actively encourage immediate failure reporting from the downstream service. If the downstream service’s HTTP responses are configured to return 503 Service Unavailable quickly, the circuit breaker will trip faster. Conversely, if the downstream service just hangs indefinitely on a request, the failureTimeout within the circuit breaker configuration is what ultimately dictates how long the calling service waits before considering the operation a failure, potentially consuming valuable threads or connections during that hang time.
Once you have circuit breakers in place, the next logical step is to implement robust fallback strategies for when the breaker is open, such as returning stale data from a cache or a predefined default response.