Next.js Rate Limiting: Throttle Requests at the Edge (2026)

The most surprising thing about rate limiting is that it’s often implemented after the request has already hit your application server, meaning you’ve already paid the cost of processing it.

Let’s see Next.js rate limiting in action, specifically at the edge. Imagine a simple API route in Next.js that fetches data.

// pages/api/data.js
export default async function handler(req, res) {
  // Simulate some work
  await new Promise(resolve => setTimeout(resolve, 100));

  res.status(200).json({ message: "Here's your data!" });
}

Without any rate limiting, if you hit this endpoint 1000 times concurrently from a single IP, your server would try to handle all those requests, potentially leading to slowdowns or even crashes.

This is where edge rate limiting comes in. Instead of waiting for the request to reach your Next.js server, we can intercept it at the edge, typically at your CDN or deployment platform (like Vercel, Cloudflare, etc.).

Consider Vercel’s Edge Config and middleware. You can define rate limiting rules that are enforced globally before the request even touches your Next.js application.

Here’s a conceptual example of how you might configure this using Vercel’s Edge Config and a middleware.js file:

// middleware.js
import { NextRequest, NextResponse } from 'next/server';
import { RateLimiter } from '@upstash/ratelimit'; // Example using Upstash
import { Redis } from '@upstash/redis';

// Initialize your rate limiter (e.g., using Redis)
// In a real scenario, this would be configured securely
const redis = new Redis({
  url: 'YOUR_UPSTASH_REDIS_URL',
  token: 'YOUR_UPSTASH_REDIS_TOKEN',
});

const limiter = new RateLimiter({
  redis,
  limiter: RateLimiter.fixedWindow(15, '10s'), // Allow 15 requests per 10 seconds
  prefix: 'mw-api-data', // Unique prefix for this limiter
});

export async function middleware(request) {
  const ip = request.ip; // Get the client's IP address

  if (!ip) {
    return NextResponse.next(); // Cannot determine IP, proceed without limiting
  }

  const { success, remaining, limit } = await limiter.limit(ip);

  if (!success) {
    return new NextResponse('Too Many Requests', {
      status: 429,
      headers: {
        'Retry-After': 10, // Suggest retrying after 10 seconds
      },
    });
  }

  // If rate limit is not exceeded, allow the request to proceed
  return NextResponse.next();
}

// This tells Next.js to run this middleware for all routes
export const config = {
  matcher: '/api/:path*',
};

In this setup:

middleware.js: This file lives at the root of your pages directory (or src if you’re using that). Next.js automatically runs it for incoming requests.
Edge Runtime: Middleware in Next.js runs on the Edge Runtime, which is significantly faster and more resource-efficient than Node.js. This is key for performance.
RateLimiter: We’re using @upstash/ratelimit as an example. You’d integrate with a fast, external store like Redis or a managed service. The fixedWindow(15, '10s') means we’re allowing a maximum of 15 requests from a single IP within any 10-second sliding window.
IP Address: The request.ip gives us the client’s IP. This is the common identifier for rate limiting.
limiter.limit(ip): This is the core operation. It checks if the IP has exceeded its limit. If success is false, the limit has been hit.
429 Too Many Requests: When the limit is exceeded, we return a 429 status code with a Retry-After header, telling the client to back off.
NextResponse.next(): If the limit is not exceeded, we allow the request to pass through to your actual API route.

The problem this solves is preventing abuse and ensuring fair usage of your API resources. By enforcing limits at the edge, you offload the burden from your application servers. Imagine a DDoS attack or a poorly written bot; without edge rate limiting, your server would be the first to feel the pain. With it, the offending requests are blocked before they consume any significant server resources.

The internal workings rely on the fact that the Edge Runtime executes JavaScript in a V8 isolate close to your users, often within the same network infrastructure as your CDN. This proximity and lightweight execution environment make it ideal for tasks like request validation and rate limiting. The rate limiting logic itself, often backed by a fast external datastore like Redis, is queried synchronously (or near-synchronously) by the middleware.

The exact levers you control are the rate (e.g., 15 requests), the time window (e.g., 10 seconds), and the identifier (e.g., IP address, though you could use API keys or other identifiers if available). You can also apply different limits to different routes or user groups.

A common misconception is that rate limiting is a simple "allow/deny" switch. In reality, effective rate limiting involves sophisticated algorithms like token bucket or leaky bucket, and careful consideration of distributed systems. For instance, using a shared Redis instance for rate limiting means that the RateLimiter instance in your middleware must be configured with the correct connection details and that the Redis server itself must be performant enough to handle the load of these checks from all your edge instances.

The next challenge you’ll likely encounter is managing more complex rate limiting scenarios, such as dynamic limits based on user tiers or implementing distributed rate limiting across multiple regions.