A LangChain Agentic RAG system doesn’t find information; it negotiates with information.

Let’s watch this agent work through a query about "the impact of renewable energy on grid stability."

from langchain_community.document_loaders import WebBaseLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_community.vectorstores import Chroma
from langchain_community.embeddings import OpenAIEmbeddings
from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_openai import ChatOpenAI
from langchain.agents import AgentExecutor, create_tool_calling_agent
from langchain_core.tools import tool

# --- Setup ---
# Load documents
loader = WebBaseLoader("https://www.nrel.gov/grid/grid-stability.html")
docs = loader.load()

# Split documents
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
splits = text_splitter.split_documents(docs)

# Create embeddings and vector store
vectorstore = Chroma.from_documents(documents=splits, embedding=OpenAIEmbeddings())
retriever = vectorstore.as_retriever()

# --- Agent Setup ---
llm = ChatOpenAI(model="gpt-4o", temperature=0)

# Define tools
@tool
def retrieve_documents(query: str) -> str:
    """Retrieves relevant documents from the knowledge base."""
    retrieved_docs = retriever.get_relevant_documents(query)
    return "\n".join([doc.page_content for doc in retrieved_docs])

# Prompt for the agent
prompt_template = """
You are an AI assistant that answers questions by retrieving and synthesizing information from documents.
You have access to a tool that can retrieve relevant documents.

When answering a question:
1. If the initial retrieval doesn't provide enough information, refine your retrieval query.
2. If you are still unsure, state that you cannot answer the question with the available information.

Here is the user's question:
{question}

Here are the retrieved documents:
{context}

Answer:
"""

# Agent and Executor
tools = [retrieve_documents]
agent = create_tool_calling_agent(llm, tools, prompt_template)
agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)

# --- Execution ---
response = agent_executor.invoke({"question": "How does the increasing penetration of renewable energy sources affect the stability of the electricity grid?"})
print(response["output"])

When you run the above, you’ll see the agent first calls retrieve_documents with the initial question. It gets some text. Then, it might realize that text isn’t quite right or comprehensive enough. It’ll then re-evaluate and call retrieve_documents again, but this time with a modified query. This is the "corrective" part – it doesn’t just accept the first pass. The "adaptive" part comes from its ability to change its strategy (its retrieval query) based on the initial results.

The core problem this solves is the inherent limitation of static retrieval. A simple RAG system takes a query, embeds it, and finds the most similar chunks. But often, the best answer isn’t in the single most similar chunk, or the initial query might be too broad, too narrow, or phrased in a way the vector search doesn’t perfectly understand. Agentic RAG introduces a loop: retrieve, assess, refine. The LLM acts as the orchestrator, deciding if retrieval is sufficient and how to improve it if it’s not. It uses its reasoning capabilities to formulate better search terms or to identify what information is missing, guiding the retriever more intelligently.

The create_tool_calling_agent function is key here. It configures the LLM to understand it has access to specific tools (in our case, retrieve_documents). The prompt_template instructs the LLM on how to use these tools, specifically mentioning refining retrieval. The AgentExecutor then manages the turn-by-turn interaction: the LLM decides what to do (call a tool, answer), the tool executes, and the result is fed back to the LLM for the next decision.

The retrieve_documents tool itself is a thin wrapper around vectorstore.as_retriever().get_relevant_documents(). The magic isn’t in the retrieval function itself, but in the LLM’s ability to decide when and how to call it again with a different query. For instance, if the initial retrieval yields documents about solar panel installation but the user asked about grid stability, the agent might infer it needs to search for terms like "intermittency," "frequency control," or "inertia" in its next retrieval attempt.

One aspect that often surprises people is how the LLM chooses to refine its query. It’s not explicitly programmed with a list of follow-up terms. Instead, it uses its understanding of the original question, the retrieved content, and its general knowledge to infer what kind of information is missing or what aspect of the topic needs deeper exploration. If the initial documents discuss the benefits of renewables but not the challenges to grid stability, the LLM might decide to re-query for terms related to "grid integration challenges" or "balancing supply and demand with variable generation." This emergent behavior is a hallmark of agentic systems.

The next step in complexity is implementing multi-hop retrieval, where the agent might need to perform several sequential retrievals, each building on the information found in the previous step.

Want structured learning?

Take the full Langchain course →