The LangChain Multi-Query Retriever doesn’t actually expand search coverage in the way you might think; it enhances the quality of the retrieved results by asking the LLM to rephrase your original query multiple times from different perspectives.

Let’s see it in action. Imagine you have a document about the "Apollo 11 mission" and you’re looking for details about the lunar landing itself.

Here’s a simplified setup:

from langchain_core.documents import Document
from langchain_community.vectorstores import FAISS
from langchain_community.embeddings import OpenAIEmbeddings
from langchain.retrievers.multi_query import MultiQueryRetriever
from langchain.prompts import PromptTemplate
from langchain.chains import LLMChain
from langchain_openai import ChatOpenAI

# Sample documents
documents = [
    Document(page_content="Apollo 11 was the first manned mission to land on the Moon. Neil Armstrong and Buzz Aldrin were the astronauts who walked on the lunar surface. Michael Collins piloted the command module in orbit."),
    Document(page_content="The lunar module, named 'Eagle,' separated from the command module 'Columbia' and descended to the Moon's surface. Armstrong famously reported, 'Houston, Tranquility Base here. The Eagle has landed.'"),
    Document(page_content="The mission launched on July 16, 1969, and the lunar module landed on July 20, 1969. The astronauts spent about 21.5 hours on the lunar surface before rejoining Collins."),
    Document(page_content="Buzz Aldrin described the lunar surface as a 'magnificent desolation.' He collected samples and planted the U.S. flag. The mission was a major victory for the United States in the Space Race."),
    Document(page_content="The return journey was successful, with the crew splashing down in the Pacific Ocean on July 24, 1969. The mission achieved its goal of landing humans on the Moon and returning them safely to Earth.")
]

# Initialize embeddings and vector store
embeddings = OpenAIEmbeddings()
vectorstore = FAISS.from_documents(documents, embeddings)
retriever = vectorstore.as_retriever()

# LLM for generating sub-queries
llm = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0.7)

# MultiQueryRetriever setup
# The prompt here is crucial. It tells the LLM what to do.
prompt_template = """
You are an AI language model that specializes in generating diverse and relevant questions to retrieve information from a knowledge base.
Your task is to generate multiple, distinct questions based on the user's original question.
These questions should explore different facets and perspectives of the original query, aiming to capture a broader range of information.
Ensure the generated questions are grammatically correct and clearly phrased.
Avoid simply rephrasing the original question; instead, aim for conceptual variations.

Original Question: {original_query}

Generate 3-5 diverse questions based on the original question.
"""
prompt = PromptTemplate(
    input_variables=["original_query"],
    template=prompt_template,
)

# Create the MultiQueryRetriever
mq_retriever = MultiQueryRetriever.from_llm(
    retriever=retriever,
    llm=llm,
    prompt=prompt,
    # We can also specify the number of queries to generate
    # k_queries=3
)

# Example usage
original_query = "What happened during the Apollo 11 lunar landing?"
retrieved_docs = mq_retriever.invoke(original_query)

print(f"Original Query: {original_query}\n")
for i, doc in enumerate(retrieved_docs):
    print(f"Retrieved Document {i+1}:")
    print(f"  Content: {doc.page_content[:150]}...") # Truncate for brevity
    print(f"  Source: {doc.metadata}\n") # Assuming metadata can be added

When you run this, the mq_retriever first sends the original_query to the llm using the provided prompt. The llm then generates a list of sub-queries. For instance, it might produce:

  • "Describe the sequence of events when Apollo 11 touched down on the Moon."
  • "What were the key moments and communications during the lunar module’s descent and landing?"
  • "Tell me about the astronauts’ experience immediately after landing on the Moon."
  • "What was the significance of the landing site for Apollo 11?"

Each of these generated queries is then individually passed to the base retriever (our FAISS vector store). The results from all these sub-queries are then de-duplicated and merged, and finally returned. This means if a document is relevant to any of the generated queries, it has a higher chance of being included in the final set.

The core problem this solves is that a single, precisely worded query might miss relevant information if that information is described using slightly different terminology or focuses on a tangential aspect that the LLM can infer. By generating multiple phrasings, you’re essentially casting a wider net for semantically similar but linguistically distinct pieces of information. It’s like asking a question in three different ways to ensure you cover all angles, rather than relying on a single, potentially too-narrow, formulation.

The MultiQueryRetriever is built on top of a standard retriever. You provide it with an existing retriever (like a vector store retriever, a BM25 retriever, etc.) and an LLM. The LLM’s role is solely to generate these diverse queries based on the user’s initial input. The prompt template is critical here; it guides the LLM on how to generate these queries – what kind of diversity is expected, how many queries to produce, and what format they should be in. You can even configure parameters like k_queries to explicitly limit the number of generated queries, although the prompt can also influence this.

The "magic" happens in how the LLM interprets the original query and brainstorms related, but distinct, ways to ask about the same underlying topic. It leverages its understanding of language and context to identify synonyms, related concepts, and different angles of inquiry. For example, if your original query is "What is the capital of France?", the LLM might generate: "Which city serves as the administrative center of France?", "Name the major global city that is the seat of French government," or "What is the primary urban hub for political and economic activity in France?". Each of these, when passed to a retriever, might pull documents that use different phrasing to describe Paris.

The mechanism that ensures you don’t get excessive duplicates is handled internally. After all sub-queries are run through the base retriever, their results are aggregated. LangChain applies a de-duplication strategy (often based on document content or a unique ID if available) to present a cleaner, more focused set of unique relevant documents. The final output is a single list of Document objects that are relevant to any of the generated queries.

A subtle point often overlooked is the LLM’s inherent biases and limitations in query generation. If the LLM has a tendency to focus on specific aspects (e.g., always generating queries about historical context when asked about a scientific discovery), the MultiQueryRetriever will inherit that bias, potentially still missing information if it lies outside the LLM’s generated query scope. Tuning the prompt to explicitly ask for different types of questions (e.g., "include questions about technical details, historical context, and future implications") can help mitigate this.

The next logical step after improving retrieval quality is to ensure the LLM can effectively synthesize the retrieved information into a coherent answer, which leads into concepts like summarization chains or advanced RAG architectures.

Want structured learning?

Take the full Langchain course →