LlamaIndex Auto-Merging Retrieval: Hierarchical Chunks (2026)

LlamaIndex’s auto-merging retrieval is a technique to improve retrieval accuracy by dynamically creating and querying hierarchical chunks of text.

Let’s see it in action. Imagine you have a long document, say a research paper, and you want to retrieve specific information. A naive approach would be to chunk the document into fixed-size pieces and retrieve based on those. However, this might split crucial context across multiple chunks or miss the broader theme. Auto-merging retrieval addresses this.

Here’s a simplified setup:

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader, StorageContext, load_index_from_storage
from llama_index.core.node_parser import SentenceSplitter
from llama_index.core.retrievers import AutoMergingRetriever
from llama_index.core.indices.loading import load_index_from_storage

# Assume you have a document named 'research_paper.txt' in a 'data' directory
# Create a directory named 'data' and place your document inside it.
# For demonstration, let's simulate a document
import os
if not os.path.exists("data"):
    os.makedirs("data")
with open("data/research_paper.txt", "w") as f:
    f.write("This is the first section. It introduces the main topic of our research. "
            "We will discuss novel approaches to data processing. "
            "The second section delves deeper into the methodology. "
            "Here, we detail the algorithms used and their parameters. "
            "This section is critical for understanding the results. "
            "The third section presents the experimental results. "
            "We show significant improvements over existing methods. "
            "Finally, the conclusion summarizes the findings and suggests future work. "
            "This includes extending the framework to other domains.")

# Load documents
documents = SimpleDirectoryReader("data").load_data()

# Define a node parser that can create child nodes
# This is key for hierarchical chunking.
# We'll use SentenceSplitter, and importantly, specify a chunk_size and secondary_chunking_ratio.
# The secondary_chunking_ratio allows for creating smaller chunks from larger ones.
text_splitter = SentenceSplitter(chunk_size=512, secondary_chunking_ratio=1.0)

# Create the index. We'll use a VectorStoreIndex for this example.
# The node_parser will be used to break down the documents into nodes.
index = VectorStoreIndex.from_documents(
    documents,
    transformations=[text_splitter]
)

# Create the auto-merging retriever.
# It needs the base retriever (e.g., VectorIndexRetriever) and the index itself.
# The `vector_store_query_mode` dictates how the initial query is performed.
# `hybrid` is often a good choice, combining vector and text search.
# `similarity_top_k` is how many initial chunks to retrieve.
# `sparse_top_k` is for the text search part of hybrid.
from llama_index.core.retrievers import VectorIndexRetriever
from llama_index.core.query_engine import RetrieverQueryEngine

base_retriever = VectorIndexRetriever(
    index=index,
    similarity_top_k=10, # Retrieve top 10 initial chunks
    vector_store_query_mode="hybrid",
    sparse_top_k=10 # For the text search component of hybrid
)

retriever = AutoMergingRetriever(
    base_retriever=base_retriever,
    # The index is passed here so the retriever can access nodes and their children.
    index=index,
    # The number of chunks to merge.
    nodes_to_child_nodes=5,
    # The number of merged chunks to return.
    num_child_recency_search=5,
    # This is the critical part: the node parser used to create the hierarchy.
    # It needs to be the same one used to build the index, or compatible.
    node_parser=text_splitter
)

# Create a query engine with the auto-merging retriever
query_engine = RetrieverQueryEngine(retriever=retriever)

# Now, let's query the index
response = query_engine.query("What are the main findings of the research?")
print(response)

response_detail = query_engine.query("Explain the methodology in detail.")
print(response_detail)

The system solves the problem of information retrieval from long documents where context is crucial and fixed-size chunks are insufficient. It works by first creating a set of initial, potentially smaller, text chunks. When a query is made, it retrieves these initial chunks. Then, it merges neighboring chunks based on their similarity and the defined hierarchy (e.g., "nodes_to_child_nodes"). This merging process effectively reconstructs larger, more semantically coherent passages that might have been split by the initial chunking. The retriever then re-ranks or re-queries based on these merged chunks.

The core levers you control are within the SentenceSplitter (or your chosen NodeParser) and the AutoMergingRetriever itself:

chunk_size and chunk_overlap (in NodeParser): These define the initial granularity of your text. Smaller chunks capture specific details but might lose broader context. Larger chunks retain context but can be less precise.
secondary_chunking_ratio (in NodeParser): This is crucial for enabling hierarchical chunking. A ratio of 1.0 means that if a large chunk is created, it will be further subdivided into smaller chunks.
nodes_to_child_nodes (in AutoMergingRetriever): This determines how many initial chunks are considered for merging into a single larger, "parent" chunk. A higher number allows for broader context to be considered during merging.
num_child_recency_search (in AutoMergingRetriever): This controls how many of the merged chunks are returned as the final result.
vector_store_query_mode (in VectorIndexRetriever): This impacts how the initial retrieval is performed. hybrid is often a good balance, but default (pure vector) or sparse (pure keyword) are also options.

The one thing most people don’t realize is that the AutoMergingRetriever doesn’t just find parent nodes; it actively reconstructs them from child nodes that were retrieved. If the initial query hits a few small chunks that, when merged, form a larger conceptual unit, the retriever will prioritize that merged unit. This is how it can overcome the limitations of fixed-size chunking by dynamically creating contextually relevant, larger passages on the fly based on the query’s needs. It’s not just about having a hierarchy of pre-defined chunks; it’s about using the query to activate and assemble the right parts of that hierarchy.

The next step is to explore different NodeParser strategies and how they interact with various vector_store_query_mode settings.