Rate-Limit Requests with Istio and Envoy Rate Limit Service (2026)

The most surprising thing about rate limiting is that it’s not primarily about preventing abuse; it’s about maintaining service stability by managing load.

Let’s see Istio and Envoy’s rate limiting in action. Imagine we have a simple service, frontend, that calls an api service. We want to ensure the api service isn’t overwhelmed by too many requests from frontend.

Here’s a basic Istio configuration for a frontend service that targets an api service:

apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: frontend-api
spec:
  hosts:
  - frontend.default.svc.cluster.local
  http:
  - route:
    - destination:
        host: api.default.svc.cluster.local
        port:
          number: 8080

Now, let’s introduce rate limiting. We’ll use Istio’s RateLimit filter, which integrates with an external Envoy RateLimitService. This service is responsible for actually enforcing the limits.

First, we need to deploy a Rate Limit Service. A common choice is the official Envoy ratelimit service.

# Deployment for the Envoy Rate Limit Service
apiVersion: apps/v1
kind: Deployment
metadata:
  name: istio-ratelimit
  labels:
    app: istio-ratelimit
spec:
  replicas: 1
  selector:
    matchLabels:
      app: istio-ratelimit
  template:
    metadata:
      labels:
        app: istio-ratelimit
    spec:
      containers:
      - name: ratelimit
        image: envoyproxy/envoy-ratelimit:v1.24.0 # Use a specific, compatible version
        ports:
        - name: grpc
          containerPort: 8080
        - name: http
          containerPort: 8081
        command:
        - "/usr/local/bin/ratelimit"
        - "--config-path"
        - "/etc/ratelimit/config.yaml"
        volumeMounts:
        - name: config-volume
          mountPath: /etc/ratelimit
      volumes:
      - name: config-volume
        configMap:
          name: ratelimit-config
---
# ConfigMap for the Envoy Rate Limit Service configuration
apiVersion: v1
kind: ConfigMap
metadata:
  name: ratelimit-config
data:
  config.yaml: |
    dynamic_configs:
      load_from_file: /etc/ratelimit/rate_limits.yaml
---
# Configuration for the actual rate limits
apiVersion: v1
kind: ConfigMap
metadata:
  name: rate-limits
data:
  rate_limits.yaml: |
    domain: api.example.com
    descriptors:
      - key: "global"
        rate_limit:
          requests_per_unit: 100
          unit: MINUTE
      - key: "request_method"
        value: "GET"
        rate_limit:
          requests_per_unit: 50
          unit: MINUTE
      - key: "client_ip"
        rate_limit:
          requests_per_unit: 200
          unit: MINUTE

This configuration defines a domain (api.example.com) and several descriptors. Descriptors are the key-value pairs that Envoy uses to categorize requests for rate limiting. We have a global limit, a limit for GET requests, and a limit based on the client’s IP address.

Next, we need to tell Istio to use this Rate Limit Service. We do this by adding a RateLimit filter to the VirtualService that routes traffic to the api service.

apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: frontend-api
spec:
  hosts:
  - frontend.default.svc.cluster.local
  http:
  - route:
    - destination:
        host: api.default.svc.cluster.local
        port:
          number: 8080
    # Add the RateLimit filter here
    rate_limits:
      - actions:
          # This action will generate a key based on the request's source IP.
          # The RateLimitService will use this to enforce per-client limits.
        - remote_address: {}
          # This action will generate a key based on the HTTP method.
          # The RateLimitService will use this to enforce per-method limits.
        - request_headers:
            header_name: ":method"
            descriptor_key: "request_method"
            # Optional: Use a specific value if needed, otherwise it uses the header value.
            # descriptor_value: "GET"
      # Another rate limit definition for global limits, using a fixed descriptor key.
      - actions:
        - generic_key:
            descriptor_key: "global"

In this VirtualService modification:

We’ve added a rate_limits section.
Each entry in rate_limits is a set of actions. Envoy will collect the results of these actions and send them as descriptors to the Rate Limit Service.
remote_address: {} tells Envoy to extract the client’s IP address and send it as a descriptor. The Rate Limit Service can then use this to enforce limits per IP.
request_headers: { header_name: ":method", descriptor_key: "request_method" } tells Envoy to extract the :method header (like GET, POST) and send it as a descriptor with the key request_method.
generic_key: { descriptor_key: "global" } is a way to send a fixed descriptor, often used for overall service-wide limits.

When a request arrives at the frontend service’s Envoy proxy, it will:

Check if there’s a RateLimit filter configured for this route.
If so, it will execute the defined actions to gather descriptor keys and values.
It will then send these descriptors to the configured Rate Limit Service (in our case, the istio-ratelimit deployment).
The Rate Limit Service will consult its configuration (rate_limits.yaml) and determine if the incoming request, based on its descriptors, exceeds any defined limits.
If a limit is exceeded, the Rate Limit Service will respond to Envoy with a denial. Envoy will then return a 429 Too Many Requests status code to the client.
If no limit is exceeded, the Rate Limit Service will respond with an allow, and Envoy will continue processing the request to the api service.

The domain in the rate_limits.yaml must match what the Envoy sidecar is configured to use. When Istio injects the Envoy configuration, it typically sets the domain in the ratelimit filter’s configuration to match the host specified in the VirtualService or the service’s FQDN. For instance, if your VirtualService targets api.default.svc.cluster.local, you might set the domain to api.default.svc.cluster.local or a more general api.example.com if you have a higher-level routing domain.

The exact mechanism by which Envoy communicates with the Rate Limit Service is via gRPC. The Rate Limit Service maintains counters in memory (or can be configured with a shared backend like Redis for distributed environments) to track requests against these descriptors over specified time units (seconds, minutes, hours).

The most common pitfall is misconfiguring the descriptor_key in the VirtualService’s RateLimit filter to not match any of the key fields defined in the rate_limits.yaml on the Rate Limit Service. Envoy will send descriptors, but the Rate Limit Service won’t have a rule to apply them against, effectively bypassing the intended limit. Another common issue is the Rate Limit Service not being reachable by the Envoy sidecars, leading to Envoy defaulting to allowing all traffic or returning a 500 if configured strictly.

Once you have rate limiting in place, the next logical step is to implement circuit breaking to gracefully handle situations where a service is consistently failing or overloaded, even after rate limiting.