The most surprising thing about LangChain memory is that it’s not about remembering past conversations, but about remembering past tool calls and their results.

Let’s see this in action. Imagine an agent that needs to find a user’s email address and then send a message to it.

from langchain_openai import ChatOpenAI
from langchain.agents import AgentExecutor, create_tool_calling_agent
from langchain_core.prompts import ChatPromptTemplate
from langchain.tools import tool

@tool
def get_user_email(user_id: str) -> str:
    """Gets the email address for a given user ID."""
    # In a real app, this would query a database or API
    if user_id == "user123":
        return "user123@example.com"
    return "unknown@example.com"

@tool
def send_email(recipient: str, subject: str, body: str) -> str:
    """Sends an email to a specified recipient."""
    print(f"Sending email to: {recipient}")
    print(f"Subject: {subject}")
    print(f"Body: {body}")
    return "Email sent successfully."

tools = [get_user_email, send_email]

prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful assistant. Respond to user queries."),
    ("user", "{input}"),
    ("placeholder", "{agent_scratchpad}"),
])

llm = ChatOpenAI(model="gpt-3.5-turbo", temperature=0)

agent = create_tool_calling_agent(llm, tools, prompt)
agent_executor = AgentExecutor(agent=agent, tools=tools)

# This is where memory comes into play implicitly
response = agent_executor.invoke({"input": "What is user123's email?"})
print(response)

response = agent_executor.invoke({"input": "Send them a message saying 'Hello there!' with the subject 'Greeting'."})
print(response)

Notice how the second invoke call doesn’t need to be told user123@example.com again. The agent, through its memory mechanism, "remembers" the result of get_user_email.

LangChain’s memory is designed to overcome the stateless nature of LLM calls. Each LLM interaction is a black box; it has no inherent context of previous interactions. Memory bridges this gap by providing the LLM with a structured representation of what has transpired. This isn’t just about storing raw text; it’s about transforming raw chat history into a format that the agent can understand and use to inform its next decision. The agent_scratchpad in the prompt template is the crucial placeholder where this memory content is injected.

There are several types of memory, each suited for different scenarios:

  • ChatMessageHistory: This is the most basic form. It simply stores a list of BaseMessage objects (like HumanMessage and AIMessage). It’s useful when you just need a chronological log of the conversation, but the agent doesn’t need to reason over past turns in a complex way. It’s essentially a raw transcript.

  • ConversationBufferMemory: This memory type stores the entire conversation history in a buffer. When the LLM needs context, the entire history is passed to it. This is great for maintaining context over long conversations but can become expensive and slow as the conversation grows, as more tokens are sent to the LLM with each turn.

  • ConversationBufferWindowMemory: A variation of ConversationBufferMemory, this keeps only the last k interactions. This is a pragmatic solution to the cost and performance issues of the full buffer, ensuring that recent context is preserved without overwhelming the LLM. The k value is a direct lever to control how much history is kept.

  • ConversationSummaryMemory: This memory type uses an LLM to summarize the conversation as it progresses. Instead of sending the full transcript, it sends a concise summary. This is highly effective for very long conversations where detailed recall of every turn isn’t necessary, but the overarching themes and decisions are. The trade-off is that the summarization process itself incurs LLM costs and can lose granular details.

  • ConversationSummaryBufferMemory: A hybrid approach. It keeps a buffer of recent messages and summarizes older messages. This aims to combine the benefits of both buffer and summary memory: immediate context from recent turns and thematic understanding from older ones.

  • VectorStoreRetrieverMemory: This is where things get powerful. Instead of just storing messages chronologically, it stores them as embeddings in a vector database. When context is needed, it retrieves the most relevant past interactions based on semantic similarity to the current input. This is ideal for agents that need to recall specific pieces of information from a vast history, rather than just general conversational flow. It requires setting up a vector store (like Chroma, FAISS, Pinecone) and an embedding model.

The agent_scratchpad is the magic conduit. When you use create_tool_calling_agent or similar agent creation functions, LangChain automatically manages this scratchpad. It takes the messages from your chosen memory object and formats them into a string that the LLM can interpret as past actions and observations. This formatting is critical: it distinguishes between what the user said, what the agent thought (tool calls), and what the result of those tool calls was.

A common pitfall is not understanding how the agent_scratchpad is populated. It’s not just a dump of the ChatMessageHistory. LangChain’s agent logic iterates through the memory, identifying tool calls and their outputs, and formats them into a specific structure the LLM understands. For example, a past tool call might appear in the scratchpad as:

Tool_code: get_user_email(user_id='user123')
Observation: user123@example.com

This explicit representation allows the LLM to "see" what happened, not just read a transcript. The agent’s internal reasoning loop uses this to decide its next step.

The next concept you’ll likely grapple with is how to manage the state of these memories across multiple user sessions or persistent agent runs.

Want structured learning?

Take the full Langchain course →