LlamaIndex doesn’t actually save your index and DocStore to disk by default; it rebuilds them from scratch every time your application restarts.

Let’s see this in action. Imagine you have a few documents and you want to index them.

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader

# Load documents
documents = SimpleDirectoryReader("./data").load_data()

# Create an index
index = VectorStoreIndex.from_documents(documents)

# Now, if you were to just let your script end here and run it again,
# LlamaIndex would re-read all your documents and rebuild the index.
# This can be slow and resource-intensive for large datasets.

The problem LlamaIndex persistence solves is exactly that: avoiding the costly re-indexing process. Instead of re-reading and processing all your source documents every time, you can save the state of your index and its associated data structures (like the DocStore) to disk. When your application starts up again, you load this saved state, making the index immediately ready for querying.

Internally, LlamaIndex uses a StorageContext to manage where and how data is stored. When you build an index, it populates this StorageContext with components like the VectorStore (which holds the embeddings) and the DocStore (which stores the original text and metadata of your documents). Persistence means serializing these components to disk.

Here’s how you actually make it persistent:

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader, StorageContext, load_index_from_storage

# Define the directory where you want to save your index
persist_dir = "./storage"

# --- First run: Create and save the index ---
if not os.path.exists(persist_dir):
    print("Creating and saving index...")
    # Load documents
    documents = SimpleDirectoryReader("./data").load_data()

    # Create a storage context
    storage_context = StorageContext.from_defaults()

    # Create an index and add documents to it
    index = VectorStoreIndex.from_documents(documents, storage_context=storage_context)

    # Save the index to the specified directory
    index.storage_context.persist(persist_dir=persist_dir)
    print(f"Index saved to {persist_dir}")
else:
    print(f"Loading index from {persist_dir}...")
    # Load the index from the storage directory
    storage_context = StorageContext.from_defaults(persist_dir=persist_dir)
    index = load_index_from_storage(storage_context)
    print("Index loaded.")

# Now you can use the index for querying, whether it was just created or loaded
query_engine = index.as_query_engine()
response = query_engine.query("What is in the data?")
print(response)

The key components here are StorageContext.from_defaults() (which lets LlamaIndex manage default storage, usually in memory or a local file system), index.storage_context.persist(persist_dir="./storage") to write everything to disk, and load_index_from_storage(storage_context) to bring it back.

When you call persist(), LlamaIndex serializes several things:

  1. Vector Store: This is where your document embeddings are stored. The default in-memory vector store (like SimpleVectorStore) will be saved as a file (e.g., vector_store.json). If you’re using a separate vector database, this step might involve committing changes to that database.
  2. Doc Store: This stores the actual text and metadata of your documents. It’s crucial for retrieving the source text when answering queries. This is typically saved as docstore.json.
  3. Index Store: This stores the structure of your index, mapping document IDs to node IDs and other index-specific metadata. This is often saved as index_store.json.
  4. Image/Graph Stores (if applicable): If you’re using image or graph-based indexing, those components will also be serialized.

The most surprising thing about LlamaIndex persistence, especially when using the default SimpleVectorStore, is that the serialization format is often a JSON file. This means that for very large indexes, the vector_store.json file can become enormous, potentially leading to slow load times if not managed carefully. It’s not a binary format optimized for massive datasets, but rather a human-readable, easily inspectable representation. While convenient for debugging, it’s a performance bottleneck for production-scale deployments if you’re not using a dedicated vector database.

Once you’ve successfully loaded an index, the next logical step is to explore different ways to query it, such as using more advanced query engines or fine-tuning retrieval strategies.

Want structured learning?

Take the full Llamaindex course →