A token bucket rate limiter can actually increase throughput for bursty traffic by allowing short spikes that exceed the average rate.
Let’s see it in action. Imagine we have a service that can handle 10 requests per second on average, but sometimes users hit it with 50 requests in a single second. Without a rate limiter, this burst can overwhelm the service, leading to errors for everyone. With a token bucket, we can configure it to allow an average of 10 requests per second, but with a "burst" capacity of, say, 20 tokens.
Here’s a Go implementation:
package main
import (
"fmt"
"net/http"
"time"
"golang.org/x/time/rate"
)
func main() {
// Create a new token bucket rate limiter.
// rate.Limit(10): The average rate of requests allowed per second.
// 20: The maximum number of tokens (burst size) the bucket can hold.
limiter := rate.NewLimiter(rate.Limit(10), 20)
http.HandleFunc("/", func(w http.ResponseWriter, r *http.Request) {
// Try to acquire a token.
// limiter.Allow() returns true if a token was available, false otherwise.
// This is a non-blocking operation.
if !limiter.Allow() {
http.Error(w, "Too Many Requests", http.StatusTooManyRequests)
return
}
// If a token was acquired, process the request.
fmt.Fprintln(w, "Request processed!")
})
fmt.Println("Starting server on :8080")
http.ListenAndServe(":8080", nil)
}
When a request comes in, limiter.Allow() checks if there’s a token available in the bucket. If there is, it consumes one and returns true, allowing the request to proceed. If the bucket is empty (meaning the recent rate of requests has exceeded the average rate), limiter.Allow() returns false, and we return a 429 Too Many Requests status.
The "tokens" in the bucket are replenished at a constant rate (rate.Limit(10) per second). The burst size (20) determines how many tokens can accumulate. This means that even if the service has been idle for a while, it can handle a burst of up to 20 requests immediately. After those 20 requests, subsequent requests will be limited to the average rate of 10 per second until the bucket refills.
The core idea is that the rate limiter doesn’t just reject requests if they exceed the average rate. Instead, it buffers them conceptually. The tokens represent the "permission" to send a request. Tokens are added at a steady rate, and requests consume tokens. If a request arrives when no tokens are available, it’s rejected. The burst capacity allows for temporary deviations from the average rate.
The rate package in Go’s standard library (well, golang.org/x/time/rate) is excellent for this. rate.NewLimiter(rate.Limit(10), 20) creates a limiter that allows 10 events per second, with a burst size of 20. rate.Limit is a type alias for float64, so you can specify fractional rates like rate.Limit(0.5) for one request every two seconds. The second argument, the burst size, is an integer representing the maximum number of tokens the bucket can hold.
The Allow() method is non-blocking. If you need to block until a token is available, you’d use limiter.Wait(ctx). This is useful when you want to ensure a request is processed eventually, rather than immediately rejecting it.
A subtle but important point is how the rate.NewLimiter constructor initializes the bucket. It doesn’t start with an empty bucket. Instead, it immediately fills the bucket up to its burst capacity. This is why a newly started limiter can handle a burst of burst requests right away, even before any time has passed for tokens to be "replenished."
This token bucket approach is particularly effective for APIs where clients might have legitimate reasons for occasional bursts of activity, but you still need to protect your backend from sustained overload. You’re essentially providing a smooth average rate while accommodating the natural variability of client behavior.
The next hurdle is handling distributed rate limiting across multiple service instances.