The LangChain Parent Document Retriever is actually a clever way to trade off retrieval accuracy for context length, and it’s a lot more flexible than you might think.

Let’s see it in action. Imagine you have a long document, like a book chapter, that you want to retrieve information from. If you chunk it into small, bite-sized pieces (say, 100 tokens each), you might lose the overarching context. When you search for "what was the protagonist’s motivation for leaving home?", a small chunk might only contain "He packed his bags." That’s not very helpful.

Here’s how the Parent Document Retriever helps. Instead of just storing those small chunks, it also stores a larger "parent" chunk that contains several of the smaller chunks. When you query, it first searches for the small chunks that are most relevant to your query. Then, instead of returning just that small chunk, it returns the larger parent chunk that encompasses it. This gives your Language Model (LLM) much more context to work with.

from langchain.chains.combine_documents.stuff import StuffDocumentsChain
from langchain.chat_models import ChatOpenAI
from langchain.document_loaders import TextLoader
from langchain.embeddings import OpenAIEmbeddings
from langchain.llms import OpenAI
from langchain.retrievers import ParentDocumentRetriever
from langchain.storage import InMemoryByteStore
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import Chroma

# Load a long document
loader = TextLoader("./my_long_document.txt")
docs = loader.load()

# Split the document into smaller chunks for retrieval
child_splitter = RecursiveCharacterTextSplitter(chunk_size=200, chunk_overlap=0)

# Split the document into larger chunks to be stored as parents
parent_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=0)

# Set up the retriever
retriever = ParentDocumentRetriever(
    vectorstore=Chroma(
        collection_name="parent_doc_retriever_demo",
        embedding_function=OpenAIEmbeddings(),
        client_settings=None, # Use default Chroma client
        persist_directory=None # Use in-memory Chroma
    ),
    # Retriever for the small chunks
    child_splitter=child_splitter,
    # Retriever for the large chunks
    parent_splitter=parent_splitter,
    # Storage for the large chunks
    parent_document_store=InMemoryByteStore(),
    # Docstore for the small chunks
    docstore=None, # Use default docstore
    # How many child chunks to retrieve
    child_results_num=5,
    # How many parent documents to retrieve
    parent_results_num=5,
)

# Add the documents to the retriever
retriever.add_documents(docs)

# Now, when you query, it first finds relevant small chunks and then returns their parent documents.
query = "What was the protagonist's motivation for leaving home?"
retrieved_docs = retriever.get_relevant_documents(query)

# The retrieved_docs will contain the larger parent chunks, providing more context.
print(retrieved_docs)

The core problem this solves is the inherent trade-off between retrieval granularity and contextual richness. Small chunks are precise for matching keywords but lack the surrounding information for a comprehensive answer. Large chunks retain context but are too broad to pinpoint specific details effectively. The Parent Document Retriever creates a two-tiered system: it uses small chunks for precise matching and then retrieves the larger, contextually rich parent chunks for answering.

Internally, it operates by creating two sets of documents: the "child" documents (small chunks) and the "parent" documents (larger chunks). The small child documents are indexed in a vector store. When a query comes in, the retriever finds the most similar child documents. Then, for each of those matched child documents, it looks up its corresponding parent document from a separate storage mechanism (like an InMemoryByteStore). The final output is a list of these parent documents, which are then passed to your LLM for synthesis.

The exact levers you control are primarily the chunk_size and chunk_overlap for both the child_splitter and parent_splitter. A smaller chunk_size for the child splitter means more precise retrieval but less context in each child. A larger chunk_size for the parent splitter means more context is available in the retrieved parent documents. The child_results_num and parent_results_num control how many initial child matches are considered and how many parent documents are ultimately returned.

What most people miss is that the ParentDocumentRetriever doesn’t just store the parent document; it stores references to the parent document alongside the child documents in the vector store. When a child document is found to be relevant, the retriever uses that reference to fetch the full parent document from the parent_document_store. This is how it achieves the separation of indexing (on small chunks) and retrieval (of large chunks) without needing two separate vector stores for the same underlying data.

After successfully implementing this, your next challenge will likely be tuning the retrieval process to balance precision and recall across the different chunk sizes.

Want structured learning?

Take the full Langchain course →