The most surprising thing about querying across multiple documents with LlamaIndex is that it doesn’t actually need to load all your documents into memory at once to answer questions about them.

Let’s see it in action. Imagine you have two text files, doc1.txt and doc2.txt.

doc1.txt:

The quick brown fox jumps over the lazy dog.
The dog's name was Bartholomew.

doc2.txt:

The cat sat on the mat.
The cat's favorite toy was a red ball.
Bartholomew the dog was afraid of cats.

Here’s how you’d set up LlamaIndex to query these:

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader, Settings
from llama_index.core.tools import QueryEngineTool, ToolMetadata
from llama_index.core.agent import ReActAgent
from llama_index.llms.openai import OpenAI # Or your preferred LLM

# Configure your LLM
Settings.llm = OpenAI(model="gpt-3.5-turbo")

# Load documents from a directory
# Ensure doc1.txt and doc2.txt are in a directory named 'data'
documents = SimpleDirectoryReader("./data").load_data()

# Create an index for these documents
index = VectorStoreIndex.from_documents(documents)

# Create a query engine from the index
query_engine = index.as_query_engine()

# Create a tool from the query engine
query_engine_tool = QueryEngineTool(
    query_engine=query_engine,
    metadata=ToolMetadata(
        name="document_query_engine",
        description="A query engine that can answer questions about the provided documents.",
    ),
)

# Create the agent
agent = ReActAgent.from_tools([query_engine_tool], verbose=True)

# Query the agent
response = agent.chat("What is the name of the dog and what is it afraid of?")
print(response)

When you run this, LlamaIndex doesn’t just dump everything into one giant prompt. Instead, it uses a sophisticated "agent" that can decide which tools to use and how to use them. In this case, our only tool is the document_query_engine. The agent will:

  1. Receive the query: "What is the name of the dog and what is it afraid of?"
  2. Consult its tools: It sees document_query_engine.
  3. Formulate a sub-query (if needed): It might break this down internally or directly ask the document_query_engine a question like "What is the dog’s name?" and then "What is the dog afraid of?".
  4. Execute the sub-query: The document_query_engine uses its underlying vector index to find relevant chunks of text from doc1.txt and doc2.txt. For "What is the dog’s name?", it finds "The dog’s name was Bartholomew." in doc1.txt. For "What is the dog afraid of?", it finds "Bartholomew the dog was afraid of cats." in doc2.txt.
  5. Synthesize the answer: The agent then takes these pieces of information and constructs the final answer: "The dog’s name is Bartholomew, and it is afraid of cats."

The core problem this solves is scalability and context window limitations. If you have thousands of documents, loading them all into a single prompt for an LLM is impossible. LlamaIndex’s agent-based approach allows it to selectively retrieve and process information, making it feasible to query vast amounts of data. It builds a mental model of your documents by creating embeddings for chunks of text and storing them in a vector store. When you ask a question, it converts your question into an embedding and finds the most semantically similar text chunks in the vector store. This is the retrieval part. The agent then uses the LLM to reason over these retrieved chunks to form a coherent answer. You control the agent’s behavior through the tools you provide and the descriptions you give them in ToolMetadata. The description is crucial; it’s how the agent understands what each tool is capable of.

What most people don’t realize is that the ReActAgent (or other agent types) doesn’t just blindly send your query to the tool. It performs a reasoning step, often breaking down complex queries into simpler sub-queries that are then executed by the underlying query engines. This allows it to chain multiple tool calls or even multiple calls to the same tool with different parameters, effectively building a plan to arrive at the answer.

The next step is to explore how to use multiple, distinct tools with your agent, not just multiple instances of the same document query engine.

Want structured learning?

Take the full Llamaindex course →