Neo4j’s shortest path algorithms aren’t just about finding the quickest route; they fundamentally alter how you reason about graph connectivity by offering different trade-offs between accuracy, speed, and computational cost.
Let’s see Dijkstra’s algorithm in action, finding the shortest path between two nodes in a graph representing flight routes, where the "cost" is flight duration.
// Sample Data: Nodes for airports, relationships for flights with duration property
CREATE (LAX:Airport {name: 'LAX'})
CREATE (SFO:Airport {name: 'SFO'})
CREATE (DEN:Airport {name: 'DEN'})
CREATE (ORD:Airport {name: 'ORD'})
CREATE (JFK:Airport {name: 'JFK'})
CREATE (LAX)-[:FLIGHT {duration: 1.5}]->(SFO)
CREATE (LAX)-[:FLIGHT {duration: 2.5}]->(DEN)
CREATE (SFO)-[:FLIGHT {duration: 1.0}]->(DEN)
CREATE (DEN)-[:FLIGHT {duration: 3.0}]->(ORD)
CREATE (DEN)-[:FLIGHT {duration: 4.0}]->(JFK)
CREATE (ORD)-[:FLIGHT {duration: 1.5}]->(JFK)
// Dijkstra's algorithm query
MATCH (start:Airport {name: 'LAX'}), (end:Airport {name: 'JFK'})
CALL apoc.algo.dijkstra(start, end, 'FLIGHT', 'duration') YIELD path, weight
RETURN path, weight
The output shows the shortest path found by Dijkstra:
{
"path": {
"start": {"Airport": {"name": "LAX"}},
"end": {"Airport": {"name": "JFK"}},
"segments": [
{"start": {"Airport": {"name": "LAX"}}, "end": {"Airport": {"name": "DEN"}}, "relationship": {"type": "FLIGHT", "properties": {"duration": 2.5}}},
{"start": {"Airport": {"name": "DEN"}}, "end": {"Airport": {"name": "JFK"}}, "relationship": {"type": "FLIGHT", "properties": {"duration": 4.0}}}
],
"length": 2
},
"weight": 6.5
}
This demonstrates Dijkstra finding the path LAX -> DEN -> JFK with a total duration of 6.5 hours, as opposed to the longer LAX -> SFO -> DEN -> JFK path.
The core problem these algorithms solve is efficiently traversing a graph to find an optimal path between two points, where "optimal" is defined by a cost or weight associated with the edges. In our flight example, we want the path with the minimum total flight duration.
Dijkstra’s algorithm works by maintaining a set of visited nodes and, for each node, the shortest known distance from the start node. It iteratively selects the unvisited node with the smallest known distance, marks it as visited, and updates the distances of its neighbors. This greedy approach guarantees finding the shortest path in graphs with non-negative edge weights.
A* (pronounced "A-star") is an informed search algorithm that builds upon Dijkstra. It uses a heuristic function to estimate the cost from the current node to the target. The priority of a node in A* is not just its cost from the start (g(n)) but also the estimated cost to the goal (h(n)), making the priority f(n) = g(n) + h(n). This heuristic guides the search more directly towards the target, often making it faster than Dijkstra on large graphs, provided the heuristic is admissible (never overestimates the actual cost).
Breadth-First Search (BFS) is the simplest of the three. It explores the graph level by level, finding the shortest path in terms of the number of edges (hops), not edge weights. If all edge weights are equal (or implicitly 1), BFS will find the shortest path. It’s computationally very efficient for unweighted graphs.
The apoc.algo.dijkstra procedure in Neo4j (part of the APOC library) is a powerful implementation. You specify the start and end nodes, the relationship type(s) to traverse, and the property on those relationships that represents the weight or cost. For A*, you’d use apoc.algo.aStar and provide a heuristic function. For BFS, Neo4j’s native shortestPath function without a weight parameter effectively performs a BFS.
The most surprising part for many is how the choice of algorithm drastically impacts performance and even the result if edge weights aren’t uniform. Dijkstra and A* are essential for weighted graphs where minimizing a specific cost (like time, distance, or monetary expense) is critical. BFS is deceptively simple but incredibly fast for finding the path with the fewest connections, which is often a valid proxy for "shortest" in many network analyses where hop count matters more than link cost.
The heuristic function in A* is critical. A poor heuristic can degrade A* performance to that of Dijkstra, while a good one can make it significantly faster. For geographic paths, the straight-line (Euclidean) distance is a common and effective heuristic.
Understanding these algorithms allows you to choose the right tool for the job, balancing the need for an absolutely optimal weighted path against the speed requirements of your application.
The next logical step is exploring how to handle graphs with negative edge weights or cycles, which these algorithms, as presented, cannot directly solve.