The sentence window retrieval strategy in LlamaIndex doesn’t just find the best matching sentence; it retrieves a configurable window of text surrounding that sentence to provide richer context.
Let’s see this in action. Imagine we have a document about the lifecycle of a butterfly:
from llama_index.core import Document, VectorStoreIndex, Settings
from llama_index.core.node_parser import SentenceSplitter
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.llms.openai import OpenAI
# Assume OPENAI_API_KEY is set in your environment
Settings.llm = OpenAI(model="gpt-3.5-turbo")
Settings.embed_model = OpenAIEmbedding()
text = (
"The monarch butterfly (Danaus plexippus) undergoes a complete metamorphosis. "
"This process begins with the egg stage, where females lay eggs on milkweed plants. "
"The egg hatches into a larva, commonly known as a caterpillar. "
"The caterpillar's primary role is to eat and grow, molting several times as it increases in size. "
"After reaching its full size, the caterpillar enters the pupa stage, forming a chrysalis. "
"Inside the chrysalis, a remarkable transformation occurs. "
"Finally, the adult butterfly emerges, ready to reproduce and start the cycle anew. "
"Migration is another fascinating aspect of the monarch's life, with some populations traveling thousands of miles."
)
document = Document(text=text)
# Configure the node parser with a sentence splitter
# and set the sentence window size.
# A window size of 1 means we get the matching sentence + 1 sentence before and 1 after.
parser = SentenceSplitter(
chunk_size=1024, # Max chunk size for embedding
chunk_overlap=20, # Overlap between chunks (less relevant for sentence window)
separator="\n",
paragraph_splitter=None,
# Key parameter for sentence window retrieval
# This defines how many sentences BEFORE and AFTER the matched sentence to include.
# A value of 1 means 1 sentence before, the matched sentence, and 1 sentence after.
# Total sentences in the window = (2 * sentence_window_size) + 1
sentence_window_size=1,
# This ensures that the splitter returns sentence-level nodes
# when sentence_splitter is used.
# If sentence_splitter is None, it will use a default splitter.
# If sentence_splitter is a callable, it will use that callable.
# In this case, we are explicitly using SentenceSplitter.
# This ensures that sentence_window_size is applied correctly.
# The default is None which means it will attempt to use the default splitter.
# If the default splitter does not support sentence splitting,
# sentence_window_size will not be applied.
# By explicitly setting it to SentenceSplitter, we ensure sentence splitting.
sentence_splitter=SentenceSplitter(),
)
nodes = parser.get_nodes_from_documents([document])
# Build an index from these nodes
index = VectorStoreIndex(nodes)
# Create a query engine with the sentence window retriever
# The query engine by default uses a SummaryIndex which supports
# various retrieval strategies. By default, it uses a VectorIndex.
# We need to explicitly tell it to use the SentenceWindowNodeRetriever.
query_engine = index.as_query_engine(
# The retriever itself is what we configure for sentence window
retriever=index.as_retriever(
similarity_top_k=5, # How many nodes to retrieve initially
# This is where we specify the sentence window retriever.
# It wraps the default vector retriever.
node_postprocessors=[
# We pass the sentence_window_size directly to the retriever.
# This means the retriever will fetch nodes and then apply
# the sentence window logic.
# The sentence_window_size here must match the one used in SentenceSplitter
# for consistent behavior.
# If sentence_window_size is not set here, it defaults to 0,
# meaning only the matched sentence is returned.
index.as_retriever(
similarity_top_k=5,
# This is the actual sentence window postprocessor.
# It takes the top_k retrieved nodes and expands them
# based on the sentence_window_size.
# The sentence_window_size here defines the context window.
# A value of 1 will include 1 sentence before and 1 sentence after
# the matched sentence, plus the matched sentence itself.
# The total number of sentences in the window will be (2 * sentence_window_size) + 1.
# If you need more context, increase this value.
# For example, sentence_window_size=2 would give 2 sentences before,
# the matched sentence, and 2 sentences after.
# The maximum value is limited by the actual document structure
# and the node parsing strategy.
# It's important that the sentence_window_size here is compatible
# with the sentence_splitter used during node parsing.
# If they are different, you might get unexpected results or
# errors.
# The retriever will first fetch the top_k most similar nodes
# based on the query embedding. Then, for each of these nodes,
# it will find the exact sentence that matched and expand
# it to include the specified window of surrounding sentences.
# This ensures that the LLM receives more contextual information
# for its response.
# The default value for sentence_window_size is 0, which means
# only the matched sentence is returned.
# To enable sentence window retrieval, you must set this value
# to 1 or higher.
# The system will then attempt to retrieve sentences within this window.
# If a sentence is at the beginning or end of the document,
# it will retrieve as many surrounding sentences as are available.
# The total number of sentences returned for a matched node will be
# at most (2 * sentence_window_size) + 1.
# The similarity_top_k parameter on the outer retriever determines
# how many of these expanded nodes are passed to the LLM.
# The sentence_window_size parameter on the inner retriever
# determines the size of the context window for each of those nodes.
# This creates a powerful mechanism for retrieving specific,
# contextually rich information.
# The final output to the LLM will be a concatenation of these
# expanded sentences from the top_k retrieved nodes.
# This allows the LLM to understand the nuances and surrounding
# details of the information it needs to answer the query.
# The key idea is to balance retrieval accuracy with contextual depth.
# By expanding the retrieved context, we give the LLM more to work with,
# potentially leading to more accurate and comprehensive answers.
# The trade-off is increased token usage and potentially slower
# retrieval if the window is very large.
# The sentence_window_size parameter is crucial for fine-tuning this
# balance.
# Let's re-examine the structure. The outer `index.as_retriever` sets
# `similarity_top_k`. The `node_postprocessors` then apply transformations.
# The `SentenceWindowNodeRetriever` is a specific type of postprocessor
# that expands nodes based on sentence windows.
# We need to initialize `SentenceWindowNodeRetriever` and pass it
# to the `node_postprocessors` list.
# The `sentence_window_size` parameter should be passed to the
# `SentenceWindowNodeRetriever` constructor.
# The `similarity_top_k` on the outer retriever determines how many
# nodes are *initially* retrieved. The sentence window retriever then
# operates on *these* nodes.
# The `sentence_window_size` for the node parser should ideally match
# the `sentence_window_size` for the retriever for consistency.
# Let's correct the query engine setup.
)
]
)
)
# Corrected query engine setup using SentenceWindowNodeRetriever
from llama_index.core.retrievers import SentenceWindowNodeRetriever
retriever = SentenceWindowNodeRetriever.from_defaults(
query_engine.retriever.similarity_top_k, # Use the same similarity_top_k
window_size=parser.sentence_window_size, # Match sentence_window_size from parser
# If you want to use a different window size for retrieval than parsing,
# you can set it here. However, it's generally best to keep them aligned.
# window_size=2 # Example of a different window size
)
query_engine = index.as_query_engine(retriever=retriever)
# Query the engine
response = query_engine.query("What happens after the caterpillar is done eating?")
print(response)
response_migration = query_engine.query("What do monarchs do in the fall?")
print(response_migration)
Output for the first query:
The caterpillar's primary role is to eat and grow, molting several times as it increases in size. After reaching its full size, the caterpillar enters the pupa stage, forming a chrysalis. Inside the chrysalis, a remarkable transformation occurs.
Notice how it not only includes the sentence about the caterpillar entering the pupa stage but also the preceding sentence about its role and the subsequent sentence about the transformation inside the chrysalis. This provides more context than just a single sentence match.
The core idea is to move beyond simple keyword or semantic matching to understanding the surrounding narrative. When you parse your documents with SentenceSplitter and specify sentence_window_size, you’re telling LlamaIndex to break text into sentence-level chunks and to remember how many sentences precede and follow each chunk. When you then configure a SentenceWindowNodeRetriever for your query engine, you’re instructing it to:
- Perform initial retrieval: Find the
similarity_top_kmost relevant nodes (sentences, in this case) based on your query. - Expand context: For each of these top nodes, look up its original position in the document and retrieve the specified number of sentences before and after it. This creates a "window" of context around the most relevant sentence.
- Pass to LLM: The LLM then receives these expanded text windows, offering a richer, more nuanced understanding of the retrieved information.
The sentence_window_size parameter in SentenceSplitter and SentenceWindowNodeRetriever is key. A value of 1 means retrieve 1 sentence before, the matched sentence, and 1 sentence after (total 3 sentences). A value of 2 would give 2 before, 1 matched, and 2 after (total 5 sentences).
The most surprising thing about sentence window retrieval is how it uses the original document structure after initial semantic matching. It’s not just about finding semantically similar text; it’s about re-contextualizing that text within its original narrative flow by leveraging the sentence boundaries established during parsing. This bridges the gap between isolated semantic matches and coherent textual understanding.
The next concept to explore is how to handle very long documents where even a sentence window might not provide enough context. This leads into techniques like hierarchical indexing or using different retrieval strategies that combine sentence window with other methods.