Property graphs are more than just nodes and edges; they’re about the rich relationships between things.
Let’s see how LlamaIndex uses a property graph to supercharge Retrieval Augmented Generation (RAG). Imagine we have a dataset about a fictional company, "Acme Corp," with information on its employees, projects, and departments.
from llama_index.core import VectorStoreIndex, StorageContext, load_index_from_storage
from llama_index.core.graph_stores import NebulaGraphStore
from llama_index.core.indices.property_graph import PropertyGraphIndex
from llama_index.core.schema import TextNode
# Assume you have a NebulaGraph instance running and accessible
# Connection details for NebulaGraph
NebulaGraphStore(
host="localhost",
port=9696,
username="root",
password="password",
space="acme_corp", # The graph space where data will be stored
)
# Sample data representing Acme Corp's structure
documents = [
TextNode(
id_="emp_1",
text="Alice works in the Engineering department and is a lead on Project X.",
metadata={"type": "employee", "name": "Alice", "department": "Engineering", "role": "Lead"}
),
TextNode(
id_="emp_2",
text="Bob is a software engineer in the Engineering department, contributing to Project Y.",
metadata={"type": "employee", "name": "Bob", "department": "Engineering", "role": "Software Engineer"}
),
TextNode(
id_="proj_x",
text="Project X is a critical initiative focused on developing new AI features. Alice is leading it.",
metadata={"type": "project", "name": "Project X", "focus": "AI Features"}
),
TextNode(
id_="proj_y",
text="Project Y aims to improve database performance. Bob is a key member.",
metadata={"type": "project", "name": "Project Y", "focus": "Database Performance"}
),
TextNode(
id_="dept_eng",
text="The Engineering department is responsible for all product development at Acme Corp.",
metadata={"type": "department", "name": "Engineering"}
)
]
# Create a property graph index
# This will create nodes and relationships in NebulaGraph based on the metadata
pg_index = PropertyGraphIndex.from_documents(
documents,
graph_store=NebulaGraphStore(
host="localhost",
port=9696,
username="root",
password="password",
space="acme_corp",
),
# Define how to extract node properties and relationships from text/metadata
# This is a crucial part for defining your graph schema
node_schema_mapping={
"employee": {"properties": ["name", "department", "role"]},
"project": {"properties": ["name", "focus"]},
"department": {"properties": ["name"]},
},
relationship_schema_mapping={
"works_in": {"source_type": "employee", "target_type": "department", "properties": []},
"leads": {"source_type": "employee", "target_type": "project", "properties": []},
"contributes_to": {"source_type": "employee", "target_type": "project", "properties": []},
"part_of": {"source_type": "project", "target_type": "department", "properties": []},
}
)
# Query the graph
query_engine = pg_index.as_query_engine()
response = query_engine.query("Who is leading Project X?")
print(response)
response = query_engine.query("What projects are in the Engineering department?")
print(response)
This code snippet demonstrates how LlamaIndex can ingest structured (via metadata) and semi-structured (via text) data, transforming it into a property graph within NebulaGraph. The node_schema_mapping and relationship_schema_mapping are key here. They tell LlamaIndex how to identify different types of entities (nodes) and how they relate to each other (edges) in your data. For instance, {"type": "employee", "name": "Alice", ...} and {"type": "project", "name": "Project X", ...} along with the leads relationship definition, will result in an edge from the "Alice" node to the "Project X" node, labeled "leads."
The real power comes when you combine this graph structure with a vector index. LlamaIndex can create a ComposableGraph which integrates both your property graph and a traditional vector store. When you query, LlamaIndex can first traverse the graph to find relevant entities and their direct connections, then use the vector index to retrieve more detailed textual information about those entities. This hybrid approach allows for more precise retrieval than a pure vector search, as it leverages the explicit relationships defined in your graph. For example, asking "Who are the engineers working on AI projects?" would first identify "Engineering" as a department and "AI Features" as a project focus via graph traversal, and then find employees linked to both.
The mental model for graph-enhanced RAG is this: your knowledge base isn’t just a flat collection of documents; it’s a network. The vector index acts as a general semantic search, finding documents that are similar to your query. The property graph index, however, finds entities that are connected according to explicit rules you define. By using both, you can ask questions that require understanding both the meaning of words and the structure of your data. For instance, a vector search might find documents about Alice, but the property graph knows specifically that Alice leads Project X.
When you use PropertyGraphIndex.from_documents, LlamaIndex analyzes the provided TextNode objects. It looks at the metadata dictionary for keys like "type", "name", etc., and uses the node_schema_mapping to determine which metadata fields should become node properties. For relationships, it infers them based on mentions in the text (e.g., "Alice works in Engineering") or by looking for connections between nodes if you’ve explicitly defined them. The relationship_schema_mapping dictates the types of relationships that can exist and which node types can be connected by them. This process populates your chosen graph database (like NebulaGraph) with nodes and edges.
The most surprising thing most people miss is that the graph traversal isn’t just about finding direct links. LlamaIndex can orchestrate complex multi-hop queries. For example, if you ask "Find all employees who work in departments that manage AI projects," it will:
- Find projects with "AI Features" as a focus.
- Identify the departments managing those projects.
- Find employees working in those departments. This ability to chain relationships, guided by your graph schema, unlocks a level of analytical capability far beyond simple keyword matching or semantic similarity. It’s about understanding the organizational structure and project dependencies.
The next step is exploring how to handle schema evolution and more complex, domain-specific relationship inference.