LlamaIndex Prompts: Customize System and Query Templates (2026)

The prompt templates in LlamaIndex are not just static strings; they’re dynamic, context-aware structures that adapt to the specific query and the system’s knowledge, making them far more powerful than simple fill-in-the-blanks.

Let’s see this in action. Imagine you have a simple RAG (Retrieval Augmented Generation) setup with an VectorStoreIndex. You’ve indexed some documents and now you want to query them.

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader, Settings
from llama_index.core.prompts import PromptTemplate

# Load documents
documents = SimpleDirectoryReader("./data").load_data()

# Build index
index = VectorStoreIndex.from_documents(documents)

# Create a query engine
query_engine = index.as_query_engine()

# --- Customizing System and Query Templates ---

# 1. Default System Prompt (for context)
# This is what the LLM sees as its instructions before processing the query.
# It's a template that will be filled with retrieved context and the user's query.
# The default usually looks something like this:
# "The following is a conversation with an AI assistant. The assistant is helpful, creative, clever, and very friendly.\n\n"
# "Current conversation:\n{chat_history}\n\n"
# "Human: {query_str}\nAI:"

# 2. Default Query Template (for the core question)
# This is the template used to format the actual question that gets sent to the LLM,
# often incorporating retrieved context. A simplified version of what the query engine
# might use internally to combine context and query:
# "{context_str}\n\n"
# "Based on the above context, please answer the following question: {query_str}"

# Let's create custom templates
custom_system_prompt = PromptTemplate(
    "You are an expert legal assistant. Your sole purpose is to answer questions based on the provided legal documents. Be precise and cite relevant sections if possible. Do not make up information.\n\n"
    "Legal Documents:\n{context_str}\n\n"
    "User Question: {query_str}\n"
    "Legal Assistant Answer:"
)

custom_query_template = PromptTemplate(
    "Using ONLY the information from the provided legal documents, answer the following question: {query_str}\n"
    "If the information is not present in the documents, state that clearly."
)

# --- Applying Custom Templates ---

# Option A: Pass templates directly to the query engine constructor
custom_query_engine_A = index.as_query_engine(
    system_prompt=custom_system_prompt,
    text_qa_template=custom_query_template, # For simple QA, this is the relevant template
    # Other potential templates exist for different modes, e.g., refine_template, chat_template
)

# Option B: Configure global settings (affects all new query engines created after this)
# Settings.system_prompt = custom_system_prompt
# Settings.text_qa_template = custom_query_template
# custom_query_engine_B = index.as_query_engine()

# Let's use Option A for demonstration
response = custom_query_engine_A.query("What is the statute of limitations for breach of contract in California?")

print(response)

Here’s what’s happening under the hood:

When you create a query_engine, it has a set of default prompt templates. These templates are responsible for structuring the input that LlamaIndex sends to the underlying Large Language Model (LLM).

System Prompt: This is the initial instruction given to the LLM. It sets the persona, the task, and the constraints. In our example, we’re telling the LLM it’s a "legal assistant," its purpose is to answer "based on provided legal documents," and to "be precise and cite relevant sections." This is crucial for guiding the LLM’s behavior. The {context_str} placeholder is dynamically filled with the text retrieved from your index that’s most relevant to the user’s query. The {query_str} is the user’s actual question.
Text QA Template (or other specific templates): This template formats the core question and the retrieved context into a coherent prompt for the LLM. LlamaIndex retrieves relevant chunks of text from your index. These chunks are then injected into the text_qa_template (for standard question-answering tasks) via the {context_str} placeholder. The user’s original query goes into {query_str}. The LLM then uses this combined information to generate an answer.

By customizing these templates, you gain fine-grained control over how the LLM interprets your data and responds to queries. You can steer its persona, enforce specific output formats, and ensure it adheres to certain rules, like only using provided context. This is how you move from generic answers to highly tailored, domain-specific responses.

The most surprising thing about LlamaIndex’s prompt templating is how it intelligently merges different types of context. For example, in a chat-based system, the chat_template would combine the system prompt, the conversation history ({chat_history}), and the new user query ({query_str}) into a single, coherent prompt for the LLM, maintaining conversational state without explicit manual management.

You’ll next want to explore how LlamaIndex handles different types of prompts for various use cases, such as summarization or structured data extraction, and how to manage prompt versions across different LLM providers.