DataLoader is a utility that provides a "batching" and "caching" mechanism for asynchronous data fetching, most commonly used with GraphQL to solve the N+1 query problem.

Let’s see this in action. Imagine a GraphQL schema where we have users and for each user, we want to fetch their posts. A naive resolver might look like this:

const resolvers = {
  Query: {
    users: async () => {
      const users = await db.getUsers(); // Fetches all users
      for (const user of users) {
        user.posts = await db.getPostsByUserId(user.id); // Fetches posts for each user
      }
      return users;
    },
  },
};

If we query for 10 users, this resolver executes db.getUsers() once, and then db.getPostsByUserId() 10 times. That’s 1 + 10 = 11 database queries. If we had 100 users, it would be 1 + 100 = 101 queries. This is the classic N+1 problem.

Now, let’s introduce DataLoader. DataLoader wraps our data-fetching functions and intelligently batches requests that happen within the same tick of the event loop.

First, we define a DataLoader for fetching posts by user ID:

import DataLoader from 'dataloader';

const batchPosts = async (userIds) => {
  // userIds will be an array like [1, 5, 2, 8]
  const posts = await db.getPostsByIds(userIds); // Fetches all posts for the given userIds in ONE query
  // We need to map the results back to the order of userIds
  const postsMap = new Map();
  for (const post of posts) {
    if (!postsMap.has(post.userId)) {
      postsMap.set(post.userId, []);
    }
    postsMap.get(post.userId).push(post);
  }
  return userIds.map(id => postsMap.get(id) || []); // Return in the requested order
};

const postsLoader = new DataLoader(batchPosts);

The batchPosts function is the core. It receives an array of userIds and is responsible for fetching all the corresponding posts in a single database call. The crucial part is that it must return the results in the exact same order as the input userIds.

Now, we integrate this DataLoader into our GraphQL resolvers. Typically, you’d create a new DataLoader instance for each request to ensure isolation and proper batching within that request’s scope.

import { ApolloServer } from '@apollo/server';
import { startStandaloneServer } from '@apollo/server/standalone';
import DataLoader from 'dataloader';

// ... (define batchPosts and postsLoader as above)

const typeDefs = `#graphql
  type Post {
    id: ID!
    title: String!
    userId: ID!
  }

  type User {
    id: ID!
    name: String!
    posts: [Post!]!
  }

  type Query {
    users: [User!]!
  }
`;

const resolvers = {
  Query: {
    users: async () => {
      const users = await db.getUsers(); // Still 1 query for users
      return users;
    },
  },
  User: {
    posts: async (user) => {
      // For each user, we call load, which queues up the request
      // DataLoader will collect all these load calls within the same tick
      // and pass the unique user IDs to batchPosts ONCE.
      return postsLoader.load(user.id);
    },
  },
};

const server = new ApolloServer({ typeDefs, resolvers });

const { url } = await startStandaloneServer(server, {
  listen: { port: 4000 },
});

console.log(`🚀 Server ready at ${url}`);

// Mock database functions for demonstration
const mockDb = {
  users: [
    { id: '1', name: 'Alice' },
    { id: '2', name: 'Bob' },
    { id: '3', name: 'Charlie' },
  ],
  posts: [
    { id: 'p1', title: 'Alice Post 1', userId: '1' },
    { id: 'p2', title: 'Alice Post 2', userId: '1' },
    { id: 'p3', title: 'Bob Post 1', userId: '2' },
    { id: 'p4', title: 'Charlie Post 1', userId: '3' },
    { id: 'p5', title: 'Charlie Post 2', userId: '3' },
  ],
  getUsers: async () => {
    console.log('--- DB CALL: getUsers ---');
    await new Promise(resolve => setTimeout(resolve, 50)); // Simulate latency
    return mockDb.users;
  },
  getPostsByIds: async (userIds) => {
    console.log(`--- DB CALL: getPostsByIds(${userIds.join(', ')}) ---`);
    await new Promise(resolve => setTimeout(resolve, 50)); // Simulate latency
    return mockDb.posts.filter(post => userIds.includes(post.userId));
  },
};

// Replace mockDb with actual database calls in a real app
const db = mockDb;

When a GraphQL query like this is executed:

query {
  users {
    id
    name
    posts {
      title
    }
  }
}
  1. Query.users resolver runs, calling db.getUsers(). This is one query.
  2. The GraphQL execution engine then needs to resolve posts for each of the returned users.
  3. For each user, User.posts resolver calls postsLoader.load(user.id).
  4. Because these load calls happen very rapidly, DataLoader collects all the user.id values into a single array.
  5. Once the event loop "ticks" and the batching window closes (typically after the current synchronous execution phase), DataLoader calls our batchPosts function with an array of all unique user IDs that were requested.
  6. batchPosts executes a single db.getPostsByIds(userIds) query.
  7. The results are mapped back and returned to the GraphQL engine, which then resolves the posts field for each user.

The total number of database queries for fetching users and their posts becomes 1 (for users) + 1 (for all posts) = 2, regardless of the number of users. This is a dramatic improvement.

DataLoader also provides caching. If the same userId is requested multiple times within the same DataLoader instance’s lifecycle (e.g., if a user object appears in different parts of the response tree or in subsequent queries), DataLoader will serve the result from its cache after the first fetch, avoiding redundant calls even to the batching function.

The most surprising thing about DataLoader is that it doesn’t magically know when to batch. It relies on the JavaScript event loop. If you introduce asynchronous operations (like await) between calls to loader.load(), you might break the batching because each await can cause the event loop to tick, potentially creating a new batching window. This is why it’s crucial to create a new DataLoader instance per request and ensure all loader.load() calls for a specific parent entity happen within the same synchronous execution context.

The next concept you’ll encounter is how to manage DataLoader instances effectively across different layers of your application, especially in complex microservice architectures or when dealing with subscriptions.

Want structured learning?

Take the full Graphql-tools course →