LlamaIndex’s auto-merging retrieval is a technique to improve retrieval accuracy by dynamically creating and querying hierarchical chunks of text.
Let’s see it in action. Imagine you have a long document, say a research paper, and you want to retrieve specific information. A naive approach would be to chunk the document into fixed-size pieces and retrieve based on those. However, this might split crucial context across multiple chunks or miss the broader theme. Auto-merging retrieval addresses this.
Here’s a simplified setup:
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader, StorageContext, load_index_from_storage
from llama_index.core.node_parser import SentenceSplitter
from llama_index.core.retrievers import AutoMergingRetriever
from llama_index.core.indices.loading import load_index_from_storage
# Assume you have a document named 'research_paper.txt' in a 'data' directory
# Create a directory named 'data' and place your document inside it.
# For demonstration, let's simulate a document
import os
if not os.path.exists("data"):
os.makedirs("data")
with open("data/research_paper.txt", "w") as f:
f.write("This is the first section. It introduces the main topic of our research. "
"We will discuss novel approaches to data processing. "
"The second section delves deeper into the methodology. "
"Here, we detail the algorithms used and their parameters. "
"This section is critical for understanding the results. "
"The third section presents the experimental results. "
"We show significant improvements over existing methods. "
"Finally, the conclusion summarizes the findings and suggests future work. "
"This includes extending the framework to other domains.")
# Load documents
documents = SimpleDirectoryReader("data").load_data()
# Define a node parser that can create child nodes
# This is key for hierarchical chunking.
# We'll use SentenceSplitter, and importantly, specify a chunk_size and secondary_chunking_ratio.
# The secondary_chunking_ratio allows for creating smaller chunks from larger ones.
text_splitter = SentenceSplitter(chunk_size=512, secondary_chunking_ratio=1.0)
# Create the index. We'll use a VectorStoreIndex for this example.
# The node_parser will be used to break down the documents into nodes.
index = VectorStoreIndex.from_documents(
documents,
transformations=[text_splitter]
)
# Create the auto-merging retriever.
# It needs the base retriever (e.g., VectorIndexRetriever) and the index itself.
# The `vector_store_query_mode` dictates how the initial query is performed.
# `hybrid` is often a good choice, combining vector and text search.
# `similarity_top_k` is how many initial chunks to retrieve.
# `sparse_top_k` is for the text search part of hybrid.
from llama_index.core.retrievers import VectorIndexRetriever
from llama_index.core.query_engine import RetrieverQueryEngine
base_retriever = VectorIndexRetriever(
index=index,
similarity_top_k=10, # Retrieve top 10 initial chunks
vector_store_query_mode="hybrid",
sparse_top_k=10 # For the text search component of hybrid
)
retriever = AutoMergingRetriever(
base_retriever=base_retriever,
# The index is passed here so the retriever can access nodes and their children.
index=index,
# The number of chunks to merge.
nodes_to_child_nodes=5,
# The number of merged chunks to return.
num_child_recency_search=5,
# This is the critical part: the node parser used to create the hierarchy.
# It needs to be the same one used to build the index, or compatible.
node_parser=text_splitter
)
# Create a query engine with the auto-merging retriever
query_engine = RetrieverQueryEngine(retriever=retriever)
# Now, let's query the index
response = query_engine.query("What are the main findings of the research?")
print(response)
response_detail = query_engine.query("Explain the methodology in detail.")
print(response_detail)
The system solves the problem of information retrieval from long documents where context is crucial and fixed-size chunks are insufficient. It works by first creating a set of initial, potentially smaller, text chunks. When a query is made, it retrieves these initial chunks. Then, it merges neighboring chunks based on their similarity and the defined hierarchy (e.g., "nodes_to_child_nodes"). This merging process effectively reconstructs larger, more semantically coherent passages that might have been split by the initial chunking. The retriever then re-ranks or re-queries based on these merged chunks.
The core levers you control are within the SentenceSplitter (or your chosen NodeParser) and the AutoMergingRetriever itself:
chunk_sizeandchunk_overlap(inNodeParser): These define the initial granularity of your text. Smaller chunks capture specific details but might lose broader context. Larger chunks retain context but can be less precise.secondary_chunking_ratio(inNodeParser): This is crucial for enabling hierarchical chunking. A ratio of1.0means that if a large chunk is created, it will be further subdivided into smaller chunks.nodes_to_child_nodes(inAutoMergingRetriever): This determines how many initial chunks are considered for merging into a single larger, "parent" chunk. A higher number allows for broader context to be considered during merging.num_child_recency_search(inAutoMergingRetriever): This controls how many of the merged chunks are returned as the final result.vector_store_query_mode(inVectorIndexRetriever): This impacts how the initial retrieval is performed.hybridis often a good balance, butdefault(pure vector) orsparse(pure keyword) are also options.
The one thing most people don’t realize is that the AutoMergingRetriever doesn’t just find parent nodes; it actively reconstructs them from child nodes that were retrieved. If the initial query hits a few small chunks that, when merged, form a larger conceptual unit, the retriever will prioritize that merged unit. This is how it can overcome the limitations of fixed-size chunking by dynamically creating contextually relevant, larger passages on the fly based on the query’s needs. It’s not just about having a hierarchy of pre-defined chunks; it’s about using the query to activate and assemble the right parts of that hierarchy.
The next step is to explore different NodeParser strategies and how they interact with various vector_store_query_mode settings.