Hybrid search in LlamaIndex isn’t just about combining two search methods; it’s about fundamentally changing how your retrieval system navigates information by leveraging the strengths of both keyword and semantic understanding.

Let’s see it in action. Imagine we have a small document store:

from llama_index.core import Document, VectorStoreIndex, StorageContext
from llama_index.core.retrievers import BM25Retriever, VectorIndexRetriever
from llama_index.core.query_engine import RetrieverQueryEngine
from llama_index.core.postprocessor import SimilarityPostprocessor
from llama_index.core.indices.loading import load_index_from_storage

# Sample documents
documents = [
    Document(text="The quick brown fox jumps over the lazy dog."),
    Document(text="A fast, reddish-brown canine leaps over a lethargic hound."),
    Document(text="The lazy dog was sleeping soundly."),
    Document(text="Brown foxes are known for their agility."),
]

# Create a vector store index
vector_index = VectorStoreIndex.from_documents(documents)

# Create a BM25 index (LlamaIndex handles this implicitly when you create a BM25Retriever)
# For BM25, we don't strictly need a separate index object in the same way as vector stores.
# The retriever itself builds its index from the provided documents.

# Retrieve from vector store
vector_retriever = VectorIndexRetriever(index=vector_index, similarity_top_k=2)

# Retrieve from BM25
bm25_retriever = BM25Retriever(nodes=[node for node in vector_index.get_nodes()]) # BM25Retriever takes nodes

# Combine retrievers
query_engine = RetrieverQueryEngine(
    retrievers=[bm25_retriever, vector_retriever],
    # We can add a postprocessor to re-rank or filter results
    # similarity_top_k controls how many results are returned *after* merging and re-ranking
    postprocessors=[SimilarityPostprocessor(similarity_top_k=2)]
)

# Query
response = query_engine.query("tell me about the agile fox")
print(response)

Here’s what’s happening: the BM25Retriever finds documents that contain the exact keywords "agile" and "fox." It’s good at identifying documents with precise term matches. The VectorIndexRetriever, on the other hand, uses embeddings to understand the meaning behind "agile fox." It might find the document about the "fast, reddish-brown canine" because "fast" and "canine" are semantically related to "agile" and "fox," even if the exact words aren’t present.

When you combine them in a RetrieverQueryEngine, LlamaIndex orchestrates the retrieval process. It sends your query to both retrievers. The BM25Retriever might return the document "Brown foxes are known for their agility." The VectorIndexRetriever might return "A fast, reddish-brown canine leaps over a lethargic hound." The RetrieverQueryEngine then takes these results, often merges them, and uses a postprocessor like SimilarityPostprocessor to re-rank them based on their overall relevance to the original query, ultimately presenting you with the most pertinent information.

The core problem hybrid search solves is the trade-off between recall and precision. Pure keyword search (like BM25) can be precise but might miss relevant documents if the exact keywords aren’t used (low recall). Pure vector search can have high recall by understanding synonyms and related concepts but might sometimes return results that are semantically close but not directly relevant to the query’s intent (lower precision, or "semantic drift"). Hybrid search aims for the best of both worlds: high recall and high precision.

Internally, LlamaIndex manages this by allowing you to define multiple Retriever objects. Each retriever has its own method of indexing and searching the underlying data. When you instantiate a RetrieverQueryEngine with a list of retrievers, it queries each one independently. The results are then aggregated. The postprocessors argument is crucial here; it’s where you define how the combined results are filtered, re-ranked, and ultimately selected. Without a postprocessor, you might just get a raw concatenation of results from each retriever, which isn’t usually optimal.

The "magic" of hybrid search often lies in the configuration of the similarity_top_k parameters for each retriever and the final similarity_top_k in the SimilarityPostprocessor. For instance, if your BM25 retriever is set to top_k=5 and your vector retriever to top_k=5, and your postprocessor is similarity_top_k=3, the query engine will fetch up to 10 documents (5 from each) and then re-rank and select the top 3 from that combined pool. Tuning these numbers is how you balance the influence of keyword matching versus semantic understanding. A higher top_k for the vector retriever might mean you’re prioritizing semantic recall, while a higher top_k for BM25 emphasizes keyword precision.

A common pitfall is not realizing that the nodes passed to BM25Retriever can be different from the nodes indexed by the VectorStoreIndex. While LlamaIndex often makes this seamless, you have full control. You can create a BM25Retriever from a subset of documents or even a different set of nodes than what your vector index uses, allowing for highly specialized hybrid strategies.

The next step is often exploring different fusion techniques beyond simple re-ranking, such as reciprocal rank fusion (RRF), which is another sophisticated way to combine ranked lists from multiple retrieval sources.

Want structured learning?

Take the full Llamaindex course →