Paging is fundamental to handling large datasets in APIs, but the common strategies of offset/limit and page number are surprisingly broken for real-world use.
Let’s see how this plays out in practice. Imagine a simple API endpoint /items that returns a list of items.
// GET /items?limit=10&offset=20
{
"items": [
{"id": 21, "name": "Item 21"},
{"id": 22, "name": "Item 22"},
// ... 8 more items
{"id": 30, "name": "Item 30"}
],
"total": 1000,
"next_offset": 30
}
This looks straightforward. You want the next page? Just increment the offset by the limit. But what happens if items are added or deleted between requests?
Consider this scenario:
- You request page 3 (
limit=10,offset=20) and get items 21-30. - In the meantime, someone deletes item 25 and adds item 101.
- You request page 4 (
limit=10,offset=30).
Your API, using offset=30, will now return items 31-40. But because item 25 was deleted, item 26 is now the 25th item in the database. Item 30 (which you received in the previous page) might now be the 29th item overall. Your "page 4" might actually contain items that were logically part of your original page 3, or it might skip items entirely if new ones were added at the beginning of the result set. This makes pagination unreliable for dynamic data.
Page-numbering suffers the exact same problem. If page 1 contains items 1-10, and page 2 contains items 11-20, and then item 5 is deleted, your page 2 request (which still asks for items 11-20) might now return items 10-19 (if your underlying query is simply "give me the 11th through 20th remaining items"). You’ve lost item 10 and repeated item 19.
This is where cursor-based pagination shines. Instead of asking for a position (like offset or page number), you ask for items after a specific item.
Here’s how a cursor-based API might look:
// GET /items?limit=10&after_id=20
{
"items": [
{"id": 21, "name": "Item 21"},
{"id": 22, "name": "Item 22"},
// ... 8 more items
{"id": 30, "name": "Item 30"}
],
"next_cursor": "20:30" // A token representing "after item ID 30"
}
The after_id=20 tells the API, "Give me the next 10 items after the item with ID 20." The next_cursor is a opaque token, often a base64 encoded string of {"id": 30, "timestamp": "2023-10-27T10:30:00Z"} or similar, that the client passes back in the next request:
// GET /items?limit=10&cursor=20:30
{
"items": [
{"id": 31, "name": "Item 31"},
{"id": 32, "name": "Item 32"},
// ... 8 more items
{"id": 40, "name": "Item 40"}
],
"next_cursor": "30:40" // A token representing "after item ID 40"
}
The key is that the cursor isn’t just an ID. It’s a reference point that anchors your request to a specific item in the sorted list. When items are added or deleted, the relative order of the remaining items doesn’t change. If item 25 was deleted, item 26 still comes after item 24. Your request for items after item 20 will correctly fetch items 21 through 30, even if item 25 is gone.
To implement this, your database query needs to be stable. A common pattern is to sort by a unique, ever-increasing field (like an id or a created_at timestamp) and then by the primary key for tie-breaking.
For example, to get items after item_id = 20 and created_at = '2023-10-27T10:30:00Z', your SQL might look something like this:
SELECT id, name, created_at
FROM items
WHERE (created_at > '2023-10-27T10:30:00Z')
OR (created_at = '2023-10-27T10:30:00Z' AND id > 20)
ORDER BY created_at ASC, id ASC
LIMIT 10;
The cursor 20:30 would encode id=20 and created_at='2023-10-27T10:30:00Z'. The next_cursor would be generated from the last item returned by the query.
The "next cursor" is often a base64 encoded JSON string like eyJrZXkiOiJpbmNyZW1lbnRzIiwidmFsdWUiOiJkZWxldGVkIiwidGltZXN0YW1wIjoiMjAyMy0xMC0yNyAxMDozMDowMFoifQ==. This is actually the base64 encoding of {"key": "increments", "value": "deleted", "timestamp": "2023-10-27 10:30:00Z"}. The server decodes this to know exactly where to resume.
The real trick to making cursor pagination robust is ensuring your cursor contains enough information to reconstruct the exact query that returned the previous page, including tie-breaking values. If you only use the ID, and two items have the same created_at timestamp, you might still have issues. Encapsulating the last item’s created_at and id (or whatever fields you’re sorting by) into the cursor is paramount.
You’ll find that using cursor-based pagination, especially with a stable sort order that includes a unique identifier for tie-breaking, makes your API much more resilient to data changes and provides a smoother experience for your clients.
The next hurdle is handling efficient querying for the previous page, which cursor pagination alone doesn’t easily support.