DataLoader lets you batch identical or overlapping database queries into a single, more efficient one.
Here’s a GraphQL schema and a basic implementation of a User type with a posts field.
type Post {
id: ID!
title: String!
author: User!
}
type User {
id: ID!
name: String!
posts: [Post!]!
}
type Query {
user(id: ID!): User
post(id: ID!): Post
}
And here’s how you might resolve the posts field for a User without DataLoader.
// In your GraphQL resolver for User.posts
async posts(user) {
// This runs for *every* user that has posts requested.
// If you fetch 10 users, this could run 10 times.
return db.posts.find({ authorId: user.id });
}
Now, let’s introduce DataLoader. The core idea is to create a DataLoader instance for each data-fetching function that might be called multiple times within a single GraphQL request.
import DataLoader from 'dataloader';
// Assume `db` is your database connection object
// Assume `db.users.findByIds` takes an array of IDs and returns an array of users in the same order.
const userLoader = new DataLoader(async (userIds) => {
console.log('Batching user load for IDs:', userIds);
const users = await db.users.findByIds(userIds);
// DataLoader expects the results array to be in the *exact same order*
// as the input `userIds` array. If a user isn't found, return `null` or `undefined`
// for that position.
const userMap = new Map(users.map(user => [user.id, user]));
return userIds.map(id => userMap.get(id));
});
// Assume `db.posts.findByUserIds` takes an array of user IDs and returns posts for each.
// It's crucial that this function can handle fetching posts for *multiple* users at once.
const postsByUserLoader = new DataLoader(async (userIds) => {
console.log('Batching post load for user IDs:', userIds);
const posts = await db.posts.findByUserIds(userIds);
// Group posts by author ID
const postsMap = new Map();
for (const post of posts) {
if (!postsMap.has(post.authorId)) {
postsMap.set(post.authorId, []);
}
postsMap.get(post.authorId).push(post);
}
// Return an array of arrays of posts, ordered by the input userIds
return userIds.map(id => postsMap.get(id) || []);
});
// Your GraphQL resolver for User.posts using DataLoader
async posts(user) {
// This doesn't immediately fetch anything. It queues a request.
return postsByUserLoader.load(user.id);
}
// Your GraphQL resolver for Query.user using DataLoader
async user(parent, { id }) {
return userLoader.load(id);
}
When a GraphQL query comes in, like:
query {
user(id: "1") {
id
name
posts {
id
title
}
}
user(id: "2") {
id
name
posts {
id
title
}
}
}
Here’s what happens:
- The
Query.userresolver forid: "1"callsuserLoader.load("1"). - The
Query.userresolver forid: "2"callsuserLoader.load("2"). - The
User.postsresolver for user "1" callspostsByUserLoader.load("1"). - The
User.postsresolver for user "2" callspostsByUserLoader.load("2").
Crucially, all these load calls within a single GraphQL request tick are collected. Before the GraphQL execution actually returns, the DataLoader instances will execute their batching functions.
The userLoader will see that "1" and "2" were requested and execute its batch function once with userIds = ["1", "2"]. The postsByUserLoader will see that "1" and "2" were requested and execute its batch function once with userIds = ["1", "2"].
This transforms potentially N separate database calls into just 2 batched calls, drastically improving performance.
The key to making this work is that your underlying data fetching functions (e.g., db.users.findByIds, db.posts.findByUserIds) must be able to accept an array of IDs and return results for all of them efficiently, ideally in a single database round trip. If your ORM or database driver doesn’t directly support fetching multiple records by ID in one go, you’ll need to write a function that does.
The DataLoader constructor takes a function. This function receives an array of keys (in our case, IDs) and must return a Promise that resolves to an array of values. The critical constraint is that the returned array of values must be in the same order as the input array of keys. If a specific key doesn’t have a corresponding value, you should put null or undefined at that position in the results array.
Consider a scenario where you have nested data. If your GraphQL query looks like this:
query {
users {
id
name
posts {
id
title
author {
id
name
}
}
}
}
If users returns an array of 10 users, and each user has posts, and each post has an author, without DataLoader, you’d have 10 postsByUserLoader.load() calls and then for each of those posts, potentially 10 userLoader.load() calls. With DataLoader, all the postsByUserLoader.load() calls for the initial 10 users are batched. Then, as the Post.author resolver is called for each post, the userLoader.load() calls are batched again. If the same user authored multiple posts, their ID will only be requested from the database once for the Post.author resolution.
The mental model is that DataLoader acts as a caching and batching layer per request. Each DataLoader instance is unique to the context of a single GraphQL query execution. You typically create new DataLoader instances for each incoming request to ensure that caches don’t leak between requests and that batching is contained within a single request’s scope. This is often done in your request handling middleware.
// Example using Express
app.use('/graphql', (req, res, next) => {
const userLoader = new DataLoader(/* ... */);
const postsByUserLoader = new DataLoader(/* ... */);
req.context = { userLoader, postsByUserLoader }; // Attach to request context
next();
});
// In your resolver, access loaders from req.context
async user(parent, { id }, req) {
return req.context.userLoader.load(id);
}
The most surprising thing is that DataLoader doesn’t actually execute the batch function when you call load(). It queues the key. The batch function only runs when the event loop has a chance to process the queued keys, which typically happens after all the resolvers for the current tick of the GraphQL execution have been scheduled. This means you can call loader.load() multiple times in different places within your resolver tree, and they will all be batched together.
The next challenge is handling mutations and ensuring data consistency across batches, especially when creating or updating records.