Callbacks are how LangChain lets you peer inside the execution of your chains and agents, giving you a window into every step, every LLM call, and every tool use.

Let’s see this in action. Imagine you have a simple chain that asks a question and then uses an LLM to answer it.

from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_openai import ChatOpenAI

# Define the prompt
prompt = ChatPromptTemplate.from_messages(
    [
        ("system", "You are a helpful assistant."),
        ("human", "{question}"),
    ]
)

# Define the LLM
llm = ChatOpenAI(model="gpt-3.5-turbo")

# Create the chain
chain = RunnablePassthrough.assign(
    answer=lambda x: llm.invoke(x["question"])
) | {"question": RunnablePassthrough(), "answer": lambda x: x["answer"].content}

# Now, let's add a callback handler
from langchain_core.callbacks import StdOutCallbackHandler

handler = StdOutCallbackHandler()

# And invoke the chain with the handler
question = "What is the capital of France?"
response = chain.invoke({"question": question}, config={"callbacks": [handler]})

print(f"\nFinal Answer: {response['answer']}")

When you run this, the StdOutCallbackHandler will print a detailed log of what’s happening under the hood. You’ll see events like on_chain_start, on_llm_start, on_llm_end, and on_chain_end, each with associated data like the prompt sent to the LLM, the LLM’s response, and the final output. This isn’t just a print statement; it’s a structured stream of events that represent the lifecycle of your LangChain execution.

The core problem callbacks solve is observability. When you build complex chains, especially those involving agents that can use multiple tools, understanding why something failed or produced a certain output becomes incredibly difficult without a way to trace the execution. Callbacks provide that trace. They allow you to intercept and log information at various points: at the start and end of a chain, before and after an LLM call, before and after a tool is used, and so on.

Internally, LangChain is designed with event hooks. When a specific action occurs (like starting a chain or invoking an LLM), the system emits an event. Callback handlers are objects that you register with the LangChain runtime, and they have methods corresponding to these events (e.g., on_llm_start, on_tool_end). When an event is emitted, LangChain calls the appropriate method on all registered handlers. This decoupling means you can create custom handlers for logging, monitoring, tracing, or even for implementing complex retry logic without modifying the core chain logic itself.

The config dictionary is where the magic happens. When you invoke a runnable (like a chain or a single LLM call), you can pass a config dictionary. The key callbacks within this dictionary accepts a list of callback handler instances. LangChain then automatically routes all execution events through these handlers. This makes it incredibly flexible; you can have multiple handlers active simultaneously, each performing a different task – one logs to the console, another sends metrics to Datadog, and a third stores detailed traces in LangSmith.

What many people don’t realize is that the run_id and parent_run_id parameters passed into callback methods are crucial for reconstructing the exact lineage of execution in complex agentic loops. When an agent decides to use a tool, and that tool invocation itself might trigger sub-chains or other LLM calls, these IDs create a directed acyclic graph (DAG) of the entire operation. You can use these IDs to group all related events together, even if they span multiple LLM calls or tool uses, effectively creating a detailed, traceable history of a single user request from start to finish.

The next step is to explore how to integrate these callbacks with external monitoring and tracing platforms for production environments.

Want structured learning?

Take the full Langchain course →