GraphQL resolvers often become the bottleneck when fetching data from databases, especially when dealing with many-to-one or many-to-many relationships.

Let’s see how this plays out in a typical scenario. Imagine a GraphQL schema for a blog:

type Author {
  id: ID!
  name: String!
  posts: [Post!]!
}

type Post {
  id: ID!
  title: String!
  author: Author!
  comments: [Comment!]!
}

type Comment {
  id: ID!
  text: String!
  post: Post!
}

type Query {
  author(id: ID!): Author
  posts: [Post!]!
}

A naive resolver for Author.posts might look like this:

// Naive resolver
const AuthorResolver = {
  posts: (parent, args, context) => {
    return context.db.query(`SELECT * FROM posts WHERE author_id = ${parent.id}`);
  }
};

If a query asks for an author and their posts, and then another author and their posts, this resolver will execute a separate SQL query for each author. If you fetch 10 authors, you get 10 SQL queries. If you fetch 100 authors, you get 100 SQL queries. This is the N+1 problem in action.

The system is designed to resolve fields independently. When the GraphQL server receives a query, it walks the schema, calling the appropriate resolver for each field. If a resolver needs to fetch related data, it typically makes a new database call. This independent resolution, while flexible, means that related data fetches are not automatically batched or optimized. The database driver or ORM doesn’t inherently know that multiple calls to the same table with different author_ids are related and could be consolidated.

Here’s a query and how the naive resolvers would execute it:

query GetAuthorsAndTheirPosts {
  author(id: "1") {
    id
    name
    posts {
      id
      title
    }
  }
  author(id: "2") {
    id
    name
    posts {
      id
      title
    }
  }
}

This would result in the following database queries:

  1. SELECT * FROM authors WHERE id = 1; (implicitly by the author resolver)
  2. SELECT * FROM posts WHERE author_id = 1; (by AuthorResolver.posts for author 1)
  3. SELECT * FROM authors WHERE id = 2; (implicitly by the author resolver)
  4. SELECT * FROM posts WHERE author_id = 2; (by AuthorResolver.posts for author 2)

Notice the two SELECT * FROM posts queries. If the query asked for 10 authors, you’d have 10 such queries.

The solution is DataLoader. DataLoader is a utility library that provides a "batching" and "caching" mechanism for asynchronous data loading. It sits between your resolvers and your data source. When a resolver needs to fetch data for a specific key (like author_id), it doesn’t immediately query the database. Instead, it queues up the request. DataLoader waits for a short period (or until the event loop is about to tick) and then collects all the unique keys requested during that period. It then makes a single database query to fetch all the data for those keys at once.

Let’s refactor the AuthorResolver.posts using DataLoader:

First, set up the DataLoader instance in your application’s context, ensuring it’s created per request.

// In your server setup (e.g., Apollo Server context function)
const DataLoader = require('dataloader');

const createLoaders = (db) => ({
  postsByAuthorId: new DataLoader(async (authorIds) => {
    // This is the single query that DataLoader triggers
    const posts = await db.query(`SELECT * FROM posts WHERE author_id IN (${authorIds.join(',')})`);
    
    // The result MUST be ordered to match the input authorIds
    const postsMap = new Map();
    posts.forEach(post => {
      if (!postsMap.has(post.author_id)) {
        postsMap.set(post.author_id, []);
      }
      postsMap.get(post.author_id).push(post);
    });

    return authorIds.map(authorId => postsMap.get(authorId) || []);
  }),
  // Other DataLoaders...
});

// In your ApolloServer constructor:
const server = new ApolloServer({
  typeDefs,
  resolvers,
  context: () => {
    const db = getDatabaseConnection(); // Your DB connection logic
    return {
      db,
      loaders: createLoaders(db),
    };
  },
});

Then, modify the resolver to use the DataLoader:

// Optimized resolver using DataLoader
const AuthorResolver = {
  posts: (parent, args, context) => {
    // parent.id is the author's ID for the current parent object
    return context.loaders.postsByAuthorId.load(parent.id);
  }
};

Now, when the same GraphQL query is executed:

query GetAuthorsAndTheirPosts {
  author(id: "1") {
    id
    name
    posts {
      id
      title
    }
  }
  author(id: "2") {
    id
    name
    posts {
      id
      title
    }
  }
}

The following happens:

  1. The author resolver for ID "1" runs.
  2. The AuthorResolver.posts resolver is called for author "1". It calls context.loaders.postsByAuthorId.load(1). This queues up 1.
  3. The author resolver for ID "2" runs.
  4. The AuthorResolver.posts resolver is called for author "2". It calls context.loaders.postsByAuthorId.load(2). This queues up 2.
  5. The event loop finishes processing these immediate tasks. DataLoader sees that load(1) and load(2) were called.
  6. DataLoader invokes its batch function with [1, 2].
  7. The batch function executes: SELECT * FROM posts WHERE author_id IN (1,2).
  8. The results are mapped back to the original requests. load(1) gets the posts for author 1, and load(2) gets the posts for author 2.

This reduces the N+1 queries (4 in the naive example) to just 1 database query for fetching posts.

DataLoader also provides automatic caching. If the same key is requested multiple times within the same request lifecycle, DataLoader will return the cached result for all but the first request. This is crucial for nested queries where a sub-field might be requested for the same parent multiple times.

The core idea is to transform individual, synchronous-looking data fetches into batched, asynchronous operations. This aligns perfectly with the asynchronous nature of I/O operations like database calls and leverages the fact that databases are highly optimized for fetching multiple rows in a single query.

The most surprising thing about DataLoader is how it achieves batching: it doesn’t actively poll or schedule; it simply waits for the current tick of the JavaScript event loop to complete. If multiple load() calls happen before the event loop "yields" back to the DataLoader’s internal queue processing, they are batched. This means that even if your resolvers are written in a seemingly synchronous style, DataLoader can still batch their asynchronous operations effectively.

The next challenge is often dealing with complex joins or when your data model doesn’t map cleanly to simple IN clauses, requiring more sophisticated batching strategies.

Want structured learning?

Take the full Graphql-tools course →