Query GitHub Data at Scale with the GraphQL API (2026)

The GitHub GraphQL API doesn’t just let you fetch data; it lets you fetch exactly the data you need, avoiding the over-fetching and under-fetching problems common with REST APIs.

Let’s see it in action. Imagine you want to find all pull requests for a specific repository that have been opened in the last 30 days, are still open, and have more than 5 comments. With a REST API, you might make several requests: one for PRs, then loop through each PR to get its comments, and then filter. With GraphQL, you can do it in a single query.

query($repoOwner: String!, $repoName: String!, $dateThreshold: DateTime!) {
  repository(owner: $repoOwner, name: $repoName) {
    pullRequests(
      first: 100,
      states: OPEN,
      since: $dateThreshold
    ) {
      edges {
        node {
          title
          url
          createdAt
          comments(first: 0) { # Fetching count, not full data
            totalCount
          }
        }
      }
    }
  }
}

To run this, you’d use a tool like curl or a GraphQL client, passing in variables:

curl -H "Authorization: Bearer YOUR_GITHUB_TOKEN" \
     -H "Content-Type: application/json" \
     -X POST \
     -d '{
           "query": "query($repoOwner: String!, $repoName: String!, $dateThreshold: DateTime!) { repository(owner: $repoOwner, name: $repoName) { pullRequests(first: 100, states: OPEN, since: $dateThreshold) { edges { node { title url createdAt comments(first: 0) { totalCount } } } } } }",
           "variables": {
             "repoOwner": "octocat",
             "repoName": "Spoon-Knife",
             "dateThreshold": "2023-10-27T00:00:00Z"
           }
         }' \
     https://api.github.com/graphql

This query structure reveals the core power: nested data fetching. You start at the repository and can traverse down to pullRequests, then to node (each individual PR), and even query nested fields like comments and their totalCount. This allows you to define precisely the shape of the data you need, no more, no less.

The problem this solves is significant for large GitHub organizations or projects with high activity. REST APIs often require multiple round trips to gather related data, leading to slow application performance and hitting API rate limits quickly. For instance, getting a list of PRs and then fetching the commit history for each would be a common pattern in REST, involving dozens or hundreds of separate API calls. GraphQL consolidates this into a single request.

Internally, the GitHub GraphQL API uses a system of resolvers. When you make a query, the API server doesn’t fetch a whole table of data. Instead, it executes a series of targeted resolvers. For repository(owner: "octocat", name: "Spoon-Knife"), it finds the repository. Then, for pullRequests, it calls a specific resolver that knows how to fetch PRs for that repository, applying filters like states: OPEN and since: $dateThreshold. Each field you request triggers its corresponding resolver. This selective fetching is what makes it efficient.

You control the data flow through the query language itself. The first and after arguments are crucial for pagination, allowing you to fetch large datasets in manageable chunks without overwhelming your client or the server. For example, to get the next page of pull requests after the first 100, you’d inspect the pageInfo object returned by the API and use the after cursor in a subsequent request.

The fragments feature is also extremely powerful for reusability and clarity. If you find yourself requesting the same set of fields for different types of objects (e.g., title, url, createdAt for both Issue and PullRequest), you can define a fragment:

fragment BasicItemInfo on Item { # Assuming Item is a common interface or type
  title
  url
  createdAt
}

query {
  repository(owner: "octocat", name: "Spoon-Knife") {
    openPullRequests: pullRequests(states: OPEN) {
      edges {
        node {
          ...BasicItemInfo
          comments {
            totalCount
          }
        }
      }
    }
    openIssues: issues(states: OPEN) {
      edges {
        node {
          ...BasicItemInfo
          assignees(first: 1) {
            totalCount
          }
        }
      }
    }
  }
}

This not only cleans up your queries but also ensures consistency in the data you retrieve. It’s a declarative way to build complex data requirements from smaller, reusable pieces.

The most surprising aspect for many is how GraphQL’s schema dictates everything. Unlike REST, where you might discover endpoints through documentation or trial-and-error, GraphQL’s schema is introspectable. You can query the schema itself to understand what types, fields, and arguments are available. This self-describing nature means tools can automatically generate documentation, provide autocompletion in editors, and even validate your queries before they hit the server, all based on that single schema definition.

Once you’ve mastered fetching data, the next logical step is to explore how to modify data using GraphQL mutations.