Neo4j indexes relationships, not just data points, which is why it excels at connecting disparate pieces of information.

Let’s build a knowledge graph from some text data and query it using LlamaIndex and Neo4j. Imagine we have a collection of documents about different companies and their products.

First, set up your Neo4j instance. You can use Docker for a quick start:

docker run --publish=7474:7474 --publish=7687:7687 -d \
    -v $HOME/neo4j/data:/data \
    -e NEO4J_AUTH=neo4j/neo4j \
    neo4j:latest

This starts a Neo4j instance with default credentials neo4j/neo4j and mounts a local directory for data persistence.

Now, let’s use LlamaIndex to ingest and index this data into Neo4j. We’ll need to install the necessary libraries:

pip install llama-index llama-index-graph-stores-neo4j neo4j

Here’s a Python script to perform the indexing:

import os
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader, StorageContext
from llama_index.graph_stores.neo4j import Neo4jGraphStore
from llama_index.core.indices.knowledge_graph import KnowledgeGraphIndex

# --- Configuration ---
NEO4J_URI = "bolt://localhost:7687"
NEO4J_USER = "neo4j"
NEO4J_PASSWORD = "neo4j"
NEO4J_DATABASE = "neo4j" # Default database
INDEX_DIR = "./data" # Directory containing your text files

# --- Prepare Sample Data (if you don't have any) ---
if not os.path.exists(INDEX_DIR):
    os.makedirs(INDEX_DIR)
    with open(os.path.join(INDEX_DIR, "company_a.txt"), "w") as f:
        f.write("Company A is a leading technology firm specializing in AI research. Their flagship product is 'InnovateAI', a platform for developing machine learning models.")
    with open(os.path.join(INDEX_DIR, "company_b.txt"), "w") as f:
        f.write("Company B is a software development company known for its productivity suite. Their main product, 'TaskMaster', helps teams manage projects efficiently.")
    with open(os.path.join(INDEX_DIR, "product_innovateai.txt"), "w") as f:
        f.write("InnovateAI is an AI research platform developed by Company A. It allows users to build and deploy custom machine learning models.")
    with open(os.path.join(INDEX_DIR, "product_taskmaster.txt"), "w") as f:
        f.write("TaskMaster is a project management software created by Company B. It offers features for task assignment, progress tracking, and collaboration.")

# --- Initialize Neo4j Graph Store ---
graph_store = Neo4jGraphStore(
    uri=NEO4J_URI,
    username=NEO4J_USER,
    password=NEO4J_PASSWORD,
    database=NEO4J_DATABASE,
)

# --- Load Data ---
documents = SimpleDirectoryReader(INDEX_DIR).load_data()

# --- Create Knowledge Graph Index ---
# We use a simple LLM for relationship extraction. For better results,
# consider using a more powerful model or fine-tuning.
storage_context = StorageContext.from_defaults(graph_store=graph_store)
kg_index = KnowledgeGraphIndex.from_documents(
    documents,
    storage_context=storage_context,
    # You can configure the LLM here if needed, e.g.,
    # llm=OpenAI(model="gpt-4"),
    # max_triplets_per_chunk=10,
)

print("Knowledge graph indexing complete!")

# --- Querying the Knowledge Graph ---
# You can now query the graph. LlamaIndex will translate your natural
# language query into Cypher queries for Neo4j.
query_engine = kg_index.as_query_engine()

response = query_engine.query("What product does Company A make?")
print(f"Query: What product does Company A make?\nResponse: {response}\n")

response = query_engine.query("Who developed TaskMaster?")
print(f"Query: Who developed TaskMaster?\nResponse: {response}\n")

response = query_engine.query("Tell me about InnovateAI.")
print(f"Query: Tell me about InnovateAI.\nResponse: {response}\n")

When you run this script, LlamaIndex will:

  1. Connect to Neo4j: It establishes a connection using the provided URI, username, and password.
  2. Read Documents: It loads text content from the data directory.
  3. Extract Entities and Relationships: It uses an LLM to identify entities (like "Company A", "InnovateAI") and the relationships between them (e.g., (Company A)-[:DEVELOPS]->(InnovateAI)).
  4. Create Nodes and Relationships in Neo4j: For each extracted triplet (head, relation, tail), it creates corresponding nodes and relationships in your Neo4j database. If nodes or relationships already exist, it will attempt to merge them.
  5. Build an Index: It creates a LlamaIndex KnowledgeGraphIndex object that references the Neo4j graph store.

The KnowledgeGraphIndex itself doesn’t store data. It’s a pointer that knows how to query your Neo4j graph. When you call as_query_engine(), LlamaIndex generates Cypher queries behind the scenes based on your natural language questions.

For example, a query like "What product does Company A make?" might be translated into something akin to:

MATCH (c:Entity {name: "Company A"})-[:DEVELOPS]->(p:Entity)
RETURN p.name

This Cypher query finds a node labeled Entity with the name "Company A", follows the DEVELOPS relationship to another Entity node, and returns the name property of that product node.

The most surprising thing about this setup is how seamlessly LlamaIndex translates complex natural language questions into graph traversal queries without you needing to write any Cypher yourself. It’s essentially a natural language interface to your graph database, leveraging the LLM’s understanding of both language and relational structures.

The next step is to explore more complex queries, such as finding products developed by companies that also specialize in AI, or tracing the development lineage of a product.

Want structured learning?

Take the full Llamaindex course →