LlamaIndex agents don’t just use tools; they’re designed to discover and orchestrate them on the fly based on the user’s intent.

Let’s see what this looks like in practice. Imagine an agent that can search the web, look up information in a PDF, and even perform a basic calculator operation.

from llama_index.agent import ReActAgent
from llama_index.llms import OpenAI
from llama_index.tools import FunctionTool

# Assume you have a PDF loaded into a document object `pdf_docs`
# and have indexed it into a `vector_store_index`

# Define tools
def pdf_lookup(query: str) -> str:
    """Looks up information in a PDF document."""
    query_engine = pdf_docs.as_query_engine(similarity_top_k=3)
    response = query_engine.query(query)
    return str(response)

def web_search(query: str) -> str:
    """Searches the web for information."""
    # In a real scenario, this would use a web search API like Tavily or SerpAPI
    return f"Simulated web search for: {query}. Results: Relevant information found."

def calculator(a: int, b: int) -> int:
    """Performs addition on two integers."""
    return a + b

# Create the agent
llm = OpenAI(model="gpt-3.5-turbo")
agent = ReActAgent.from_tools(
    [
        FunctionTool.from_defaults(fn=pdf_lookup),
        FunctionTool.from_defaults(fn=web_search),
        FunctionTool.from_defaults(fn=calculator),
    ],
    llm=llm,
    verbose=True,
)

# Ask a question that requires multiple tools
response = agent.chat("What is the capital of France, and what is 2 + 3?")
print(response)

When you run this, you’ll see the agent’s thought process: it identifies that it needs to find the capital of France (likely using web search) and perform a calculation. It then decides on the order, executes the tools, and synthesizes the answer. The verbose=True flag is your window into this process, showing you the tool calls and their outputs.

The core problem LlamaIndex agents solve is bridging the gap between a large language model’s understanding and the ability to take concrete actions in the real world or within specific data sources. LLMs are excellent at reasoning and generating text, but they can’t directly query a database or browse the internet. Tools provide that bridge. LlamaIndex’s ReActAgent (Reasoning and Acting) is a prime example. It uses a prompt-based approach to:

  1. Reason: Understand the user’s query and identify what information or action is needed.
  2. Select Tool(s): Choose the most appropriate tool(s) from a given set based on the query.
  3. Act: Call the selected tool with the correct arguments.
  4. Observe: Get the output from the tool.
  5. Iterate: Repeat the process, potentially using the output of one tool as input for another, until the final answer is synthesized.

The FunctionTool class is key here. It wraps your Python functions, making them discoverable by the agent. When you define a function with type hints and a docstring, LlamaIndex uses these to infer how to call the function and what it does, passing this information to the LLM. The LLM then uses this metadata to decide if and how to use your function.

The "tools" themselves can be anything you can represent as a Python function:

  • Data Retrieval: Querying databases, searching vector stores (like the PDF example), accessing APIs.
  • Action Execution: Sending emails, scheduling meetings, controlling smart devices.
  • Computation: Performing complex calculations, running simulations.

The agent’s ability to chain these tools together is where the real power lies. For instance, a user might ask, "Find me recent news about renewable energy and summarize the key findings." The agent would first use a web search tool to find news articles, then potentially a summarization tool (or use its own LLM capabilities) to condense the information, and finally present the summary.

The specific arguments passed to a tool are determined by the LLM’s interpretation of the user’s query and the tool’s signature (its function definition). If the LLM needs to call calculator(a: int, b: int) and the user asks "what is 5 plus 7?", the LLM will infer a=5 and b=7. This inference process is crucial and relies heavily on the LLM’s capabilities and the clarity of your tool descriptions (docstrings).

The most surprising aspect for many is how the agent handles ambiguity or missing information. If a tool requires an argument that isn’t explicitly provided in the user’s query, the agent won’t just fail. It will often prompt the user for clarification. For example, if the user asks "Calculate the sum," the agent might respond, "I can do that! What two numbers would you like to add?" This interactive clarification is a direct result of the agent’s reasoning loop and its ability to recognize incomplete tool calls based on the LLM’s understanding of the tool’s requirements.

The next step in building sophisticated agents is often handling tool failures gracefully and implementing more complex orchestration logic, such as conditional execution or parallel tool calls.

Want structured learning?

Take the full Llamaindex course →