LlamaIndex Query Planning: Decompose Complex Questions (2026)

The most surprising thing about query planning in LlamaIndex is that it’s not about finding the answer, but about breaking down the question into smaller, manageable pieces.

Imagine you ask LlamaIndex: "What was the revenue of Apple in Q4 2023, and how does that compare to its R&D spending in the same quarter?"

This isn’t a single, simple retrieval operation. It’s two distinct pieces of information that need to be gathered and then a comparison needs to be made. Query planning is the process of identifying these distinct sub-queries and the order in which they should be executed.

Here’s a simplified view of what that might look like internally:

# Conceptual representation of the query plan
query = "What was the revenue of Apple in Q4 2023, and how does that compare to its R&D spending in the same quarter?"

# LlamaIndex's query planner identifies these steps:
plan = [
    {
        "step": 1,
        "description": "Retrieve Apple's Q4 2023 revenue",
        "tool": "vector_index_retrieval",
        "params": {"query": "Apple Q4 2023 revenue"}
    },
    {
        "step": 2,
        "description": "Retrieve Apple's Q4 2023 R&D spending",
        "tool": "vector_index_retrieval",
        "params": {"query": "Apple Q4 2023 R&D spending"}
    },
    {
        "step": 3,
        "description": "Compare revenue and R&D spending",
        "tool": "llm_synthesis",
        "params": {
            "prompt": "Compare revenue: {revenue_result} and R&D spending: {rnd_result}",
            "dependencies": [1, 2] # Depends on the results of steps 1 and 2
        }
    }
]

# The query engine then executes these steps sequentially,
# feeding the results of one step into the next.

The system you’re interacting with isn’t just a single LLM call. It’s a sophisticated orchestrator. It takes your complex query and breaks it down into a series of smaller, atomic queries that can be executed against your data sources (like vector indexes, SQL databases, or even other LLMs). It then figures out the dependencies between these atomic queries and how to combine their results.

The core problem query planning solves is the limitation of LLMs in directly answering multi-faceted questions that require retrieving and processing information from disparate sources. Instead of trying to "hallucinate" an answer or fail entirely, LlamaIndex delegates the retrieval and aggregation tasks to specialized tools.

The QueryEngine is the entry point. When you call query_engine.query("your complex question"), the QueryEngine first passes the question to a QueryPlanner. The QueryPlanner’s job is to inspect the question and decide which Tools (like VectorIndexRetriever, NLSQLTableRetriever, or a custom FunctionTool) are needed and in what order. This results in a QueryPlan, which is essentially a structured list of operations.

The QueryEngine then takes this QueryPlan and executes it. For each step in the plan, it invokes the specified Tool with the appropriate parameters. If a step depends on the output of a previous step, the QueryEngine ensures that the previous step is completed first and its result is passed along. Finally, it synthesizes the results from all the executed steps into a coherent answer.

The QueryPlanner itself can be configured. You can choose different planning strategies, such as HierarchicalPlanner (which breaks down the query recursively) or ToolCallingPlanner (which uses LLM tool-calling capabilities to determine the plan). The choice of planner can significantly impact performance and the complexity of plans generated.

The actual "execution" of a plan step often involves calling a Retriever (like VectorIndexRetriever for text data or NLSQLTableRetriever for SQL databases) or a LLMTool for synthesis. The Retriever fetches relevant data chunks or rows, and the LLMTool uses an LLM to process these chunks and generate a natural language response or intermediate result.

One critical aspect often overlooked is how LlamaIndex handles the context window limitations of LLMs when synthesizing answers from multiple retrieved pieces of information. It doesn’t just dump all retrieved text into a single prompt. Instead, it uses techniques like "context stuffing" or iterative refinement, where it might process retrieved chunks in batches or prompt the LLM to summarize intermediate results before combining them. This ensures that even if you retrieve a large volume of data across several steps, the final synthesis remains manageable within the LLM’s context.

The next concept you’ll likely explore is how to customize the Tools available to the QueryPlanner to integrate with your specific data sources or external APIs.