LangChain’s async chains let you run multiple LLM calls concurrently, drastically speeding up operations that involve independent, parallel LLM interactions.

Let’s see it in action. Imagine you need to summarize two different documents, and you don’t care about the order or intermediate results – you just want both summaries as fast as possible.

import asyncio
from langchain.llms import OpenAI
from langchain.chains import LLMChain
from langchain.prompts import PromptTemplate

async def run_parallel_summaries():
    llm = OpenAI(temperature=0.7)

    prompt_template = PromptTemplate.from_template("Summarize the following text:\n\n{text}")
    chain = LLMChain(llm=llm, prompt=prompt_template)

    document1 = "This is a very long document about the history of the internet, detailing its origins, key milestones, and future potential. It covers ARPANET, TCP/IP, the World Wide Web, and the rise of social media."
    document2 = "This document explores the intricate world of quantum physics, explaining concepts like superposition, entanglement, and quantum tunneling. It delves into their implications for computing and cryptography."

    # Create tasks for each LLM call
    task1 = asyncio.create_task(chain.arun(text=document1))
    task2 = asyncio.create_task(chain.arun(text=document2))

    # Wait for both tasks to complete concurrently
    summary1, summary2 = await asyncio.gather(task1, task2)

    print("Summary 1:", summary1)
    print("Summary 2:", summary2)

if __name__ == "__main__":
    asyncio.run(run_parallel_summaries())

This code defines a simple LLMChain. Instead of calling chain.run() sequentially, which would wait for the first summary to finish before starting the second, we use chain.arun() within asyncio.create_task(). asyncio.gather() then orchestrates these tasks, allowing them to execute concurrently. The system doesn’t wait for task1 to complete before starting task2; it schedules both and waits for whichever finishes first.

The core problem async chains solve is the latency inherent in network-bound operations like LLM calls. A typical chain might involve a prompt, an LLM call, and then perhaps parsing the output. If you have multiple such chains that are independent, running them sequentially means you’re paying the full latency for each one. With async chains, you can overlap the waiting periods. The LLM API call for the first chain is made, and while you’re waiting for its response, the LLM API call for the second chain is initiated.

Internally, LangChain leverages Python’s asyncio library. The LLMChain class has an arun method that returns an awaitable (a coroutine). When you call chain.arun(), you’re not executing the LLM call immediately; you’re creating a promise that the LLM call will happen. asyncio.create_task() takes this promise and schedules it to run on the event loop. asyncio.gather() is the conductor that waits for all the scheduled tasks to complete. The OpenAI LLM class, and many other LLM integrations in LangChain, have asynchronous counterparts (e.g., OpenAI.agenerate instead of OpenAI.generate) that are designed to be await-ed.

The key levers you control are the arun methods of chains and the asyncio primitives like create_task and gather. You can also use asyncio.wait for more fine-grained control over when tasks complete. Any chain that has an arun method can be run asynchronously. This includes custom chains, sequential chains, and even more complex agentic loops if they are structured to yield control back to the event loop.

A common misconception is that asyncio.gather magically makes the LLM calls themselves faster. It doesn’t. The LLM still takes the same amount of time to process your prompt. What asyncio.gather does is allow your Python program to do other work while waiting for those LLM responses. In this specific pattern, the "other work" is initiating the next LLM call, effectively hiding the latency of one call behind the execution of another.

The next step is to explore how to manage and orchestrate more complex parallel workflows, such as using asyncio.wait with return_when=asyncio.FIRST_COMPLETED to process results as they become available.

Want structured learning?

Take the full Langchain course →