Rate limiting isn’t just about preventing abuse; it’s a crucial mechanism for ensuring service stability and predictable performance across your APIs.

Let’s see how this plays out in practice across three popular API gateways: Kong, AWS API Gateway, and Nginx.

Kong

Kong’s rate limiting is highly configurable and flexible, often implemented via plugins.

Scenario: You have a public API endpoint /users that you want to limit to 100 requests per minute per IP address.

Configuration:

Kong’s rate limiting is typically managed through its rate-limiting plugin. You’d apply this plugin to your API or a specific route.

{
  "name": "rate-limiting",
  "config": {
    "policy": "local", // or "redis" for distributed
    "limit": {
      "rate": 100,
      "period": 60, // seconds
      "identifier": "ip"
    },
    "keys": ["ip"]
  },
  "enabled": true,
  "api": {
    "id": "YOUR_API_ID" // or "name" and "upstream_url"
  }
}

Explanation:

  • policy: "local": The rate limiting counters are stored in Kong’s memory. For distributed setups, you’d use "redis".
  • rate: 100: The maximum number of requests allowed.
  • period: 60: The time window in seconds (one minute).
  • identifier: "ip": The basis for rate limiting (e.g., IP address, consumer username, authenticated user ID).
  • keys: ["ip"]: Specifies the actual key to use for tracking, matching the identifier.

When the limit is exceeded, Kong returns a 429 Too Many Requests response.

AWS API Gateway

AWS API Gateway offers built-in throttling capabilities that can be configured at the account, API, or method level.

Scenario: You want to limit a specific method, GET /items, within your API to 50 requests per second per API key.

Configuration:

You configure throttling directly within the API Gateway console or via CloudFormation/CDK.

  1. Method Throttling: Navigate to your API, select the resource and method, then go to "Settings."

    • Throttling Burst Limit: Set to 100 (this is the maximum number of requests that can be sent at any one time, allowing for spikes).
    • Throttling Rate Limit: Set to 50 requests per second.
  2. Usage Plan Throttling (for API Keys): For per-API key throttling, you associate methods with a Usage Plan.

    • Create a Usage Plan.
    • Set the "Rate" to 50 requests per second and "Burst" to 100.
    • Add your API stage and associate an API key with this Usage Plan.

Explanation:

  • Burst Limit: This is a short-term capacity that allows for traffic spikes. It’s the maximum number of requests that can be served immediately when the throttling rate is exceeded.
  • Rate Limit: The average number of requests per second that API Gateway will allow.
  • Usage Plans: Essential for managing access and throttling for specific clients identified by API keys.

When a client exceeds the configured limits, API Gateway returns a 429 Too Many Requests response.

Nginx

Nginx provides powerful rate limiting through its limit_req_zone and limit_req directives.

Scenario: You want to limit requests to /api/v1/data to 10 requests per minute per IP address.

Configuration:

This is typically done in your nginx.conf or a site-specific configuration file within /etc/nginx/conf.d/.

http {
    # Define a rate limiting zone
    # $binary_remote_addr: uses the client's IP address as the key
    # 10r/m: 10 requests per minute
    # burst=15: allows a burst of up to 15 requests
    # nodelay: ensures requests exceeding the burst limit are rejected immediately, not delayed
    limit_req_zone $binary_remote_addr zone=mylimit:10m burst=15 nodelay;

    server {
        listen 80;
        server_name example.com;

        location /api/v1/data {
            # Apply the rate limiting zone
            # The number here is the average request rate per second
            # 10 requests per minute means 10/60 requests per second
            limit_req zone=mylimit burst=15 nodelay;
            proxy_pass http://backend_server;
        }

        # Other locations...
    }
}

Explanation:

  • limit_req_zone $binary_remote_addr zone=mylimit:10m burst=15 nodelay;:

    • $binary_remote_addr: This variable contains the client’s IP address in binary format, making it efficient for lookup.
    • zone=mylimit:10m: Defines a shared memory zone named mylimit with a capacity of 10MB. Nginx uses this zone to store the state of rate limiting for each client IP. The 10m specifies the size of the shared memory zone, not the time period directly, but it implies a maximum number of concurrent clients that can be tracked. A common rule of thumb is ~1MB per 8000 client states.
    • burst=15: This allows a burst of up to 15 requests to be accepted even if they exceed the defined rate.
    • nodelay: If a request exceeds the burst limit, it’s rejected immediately with a 503 Service Temporarily Unavailable response. Without nodelay, excess requests would be delayed until they fall within the rate limit.
  • limit_req zone=mylimit burst=15 nodelay;: This directive is placed within the location block and applies the defined mylimit zone.

    • The effective rate is derived from the zone definition’s implicit rate calculation based on its size and how Nginx manages it. For a 10r/m (10 requests per minute) rate, you’d typically configure this in the limit_req_zone directive itself. The example 10r/m is better expressed directly. A more accurate Nginx configuration for "10 requests per minute" would be:
    http {
        # 10 requests per minute = 10/60 requests per second
        # Nginx's rate is specified in requests per second.
        # To achieve 10r/m, we set the rate to 10/60 = 0.166...
        # For practical purposes, you can define it in seconds directly in the zone definition
        # limit_req_zone $binary_remote_addr zone=mylimit:10m rate=10r/m burst=15 nodelay;
        # Or, more commonly, by defining the rate in requests per second if the zone size is sufficient.
        # A zone size of 10m is usually enough to hold state for many thousands of IPs.
        # Let's re-state the zone for clarity to mean 10 requests per minute.
        # The rate is implicitly handled by the zone's capacity and how Nginx samples it.
        # A common approach is to define the rate *per second* in the zone.
        # For 10 requests per minute, that's 10/60 = 0.166 requests per second.
        # Nginx's rate syntax is `rate=N/unit`, where unit can be `r/s`, `r/m`, `r/h`, `r/d`.
        limit_req_zone $binary_remote_addr zone=mylimit:10m rate=10r/m burst=15 nodelay;
    
        server {
            listen 80;
            server_name example.com;
    
            location /api/v1/data {
                limit_req zone=mylimit burst=15 nodelay;
                proxy_pass http://backend_server;
            }
        }
    }
    

When the limit is hit, Nginx returns a 503 Service Temporarily Unavailable response by default. You can customize this response using limit_req_status.

The primary difference in Nginx is that it returns a 503 by default, whereas Kong and AWS API Gateway return 429. This is a crucial distinction for client-side error handling.

The next step in managing API traffic often involves implementing circuit breakers, which work in conjunction with rate limiting to gracefully handle degraded service.

Want structured learning?

Take the full DevOps & Platform Engineering course →