LlamaIndex OpenAI Agents: Build Tool-Using AI Agents (2026)

The most surprising thing about LlamaIndex OpenAI agents is that they don’t actually reason in the way you or I might think of it; they’re more like incredibly sophisticated pattern matchers that leverage the "tool use" capabilities of large language models.

Let’s see one in action. Imagine we want an agent that can answer questions about our company’s internal documentation. We can give it access to a vector store containing our documents and a search tool.

import os
from llama_index.core import VectorStoreIndex, Settings
from llama_index.llms.openai import OpenAI
from llama_index.core.tools import FunctionTool
from llama_index.core.agent import ReActAgent

# Assume you have your OpenAI API key set as an environment variable
# os.environ["OPENAI_API_KEY"] = "sk-..."

# Load your documents and create a vector store index
# For this example, let's simulate a simple index
from llama_index.core import Document
documents = [Document(text="The Q3 sales target for the North American region is $5 million.")]
index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine()

# Define a tool that uses the query engine
def query_internal_docs(question: str):
    """Queries the internal documentation for answers to a question."""
    response = query_engine.query(question)
    return response.response

# Create an OpenAI LLM instance
# Use a model that supports function calling, like "gpt-3.5-turbo" or "gpt-4"
Settings.llm = OpenAI(model="gpt-3.5-turbo")

# Create the agent
agent = ReActAgent.from_tools([FunctionTool(query_internal_docs)])

# Now, let's ask a question
response = agent.chat("What was the Q3 sales target for North America?")
print(response)

When you run this, you’ll see output that looks something like this:

Thought: The user is asking a question about the Q3 sales target for North America. I have a tool that can query internal documentation. I should use that tool to find the answer.
Tool: query_internal_docs
Tool Input: {"question": "What was the Q3 sales target for North America?"}
Observation: The Q3 sales target for the North American region is $5 million.
Thought: I have found the answer to the user's question. I should now respond to the user.
Response: The Q3 sales target for the North American region is $5 million.

This output shows the agent’s internal "thought process." It recognized the user’s intent, identified the appropriate tool (query_internal_docs), formulated the input for that tool, received the observation (the answer from the tool), and then constructed the final response.

The core problem these agents solve is bridging the gap between a user’s natural language request and the structured, programmatic actions required to fulfill that request. LLMs are great at understanding language, but they can’t directly interact with external systems like databases, APIs, or even local files. Agents provide this bridge by allowing the LLM to "call" predefined tools.

Internally, LlamaIndex agents, particularly the ReActAgent, operate on a loop inspired by the ReAct (Reasoning and Acting) framework. The LLM receives the user’s query and the descriptions of available tools. It then reasons about which tool, if any, is most appropriate. If a tool is chosen, the LLM generates the arguments for that tool. The tool is executed, and its output (the "observation") is fed back into the LLM. The LLM then uses this observation to either refine its next action or formulate the final answer. This cycle repeats until the agent determines it has sufficient information to answer the user’s query.

The "tools" you define are essentially Python functions with clear docstrings. The LLM uses these docstrings to understand what the tool does and how to call it. When you create a FunctionTool, LlamaIndex serializes this function and its docstring into a format that the OpenAI API’s function calling feature can understand. The LLM then generates JSON that adheres to this schema.

A subtle but crucial aspect of agent behavior is how it handles ambiguity or when no tool is suitable. If the LLM decides no tool can answer the question, it will typically state that it cannot fulfill the request based on the available tools. You can influence this by carefully crafting your tool descriptions and by providing relevant context to the agent’s initial prompt. For instance, if a user asks about a specific product not in your documentation, and your query_internal_docs tool only searches that documentation, the agent won’t magically invent an answer; it will report that it can’t find the information.

The next step in building sophisticated agents involves managing multiple tools, handling tool failures gracefully, and implementing more complex reasoning chains where one tool’s output feeds into another.