Neo4j’s spatial indexing lets you treat geographic coordinates like graph nodes, letting you ask questions like "show me all restaurants within 5 miles of this point" without exploding your query performance.
Here’s a Neo4j graph with some location data, using the spatial plugin. We’ve got City nodes with a location property (a WKT point) and Business nodes, also with a location.
CREATE (london:City {name: "London", location: point({srid: 4326, x: -0.1278, y: 51.5074})})
CREATE (paris:City {name: "Paris", location: point({srid: 4326, x: 2.3522, y: 48.8566})})
CREATE (restaurant1:Business {name: "The French Laundry", type: "Restaurant", location: point({srid: 4326, x: -0.1300, y: 51.5050})})
CREATE (cafe1:Business {name: "Le Petit Cafe", type: "Cafe", location: point({srid: 4326, x: 2.3500, y: 48.8500})})
CREATE (restaurant2:Business {name: "Pizza Palace", type: "Restaurant", location: point({srid: 4326, x: -0.1250, y: 51.5100})})
Now, let’s ask a spatial query: "Find all businesses within 1000 meters of the center of London."
MATCH (london:City {name: "London"})
MATCH (b:Business)
WHERE distance(london.location, b.location) <= 1000
RETURN b.name, b.type, distance(london.location, b.location) AS distance_meters
This query uses the distance() function, which calculates the great-circle distance between two points on a sphere. Without spatial indexing, Neo4j would have to check the location property of every single Business node. For a small dataset, this is fine. For millions of businesses, it’s a non-starter.
The spatial plugin provides a way to build a spatial index that Neo4j can use to quickly prune the search space. You create a spatial index on a node label and a property containing geographic data.
CREATE INDEX FOR (b:Business) ON (b.location)
This command creates a standard GIN (Generalized Inverted Index) or similar index on the location property of Business nodes. When you run a spatial query, Neo4j’s query planner can recognize that the distance() function (or other spatial predicates like withinDistance, intersects) can leverage this index. Instead of scanning all nodes, it uses the index to find candidate nodes that might be within the specified distance, and then it performs the exact distance calculation only on those candidates.
The real power comes when you combine graph traversal with spatial queries. Imagine you want to find all restaurants within 500 meters of any business that is itself within 2000 meters of London.
MATCH (london:City {name: "London"})
MATCH (nearby_business:Business)
WHERE distance(london.location, nearby_business.location) <= 2000
MATCH (restaurant:Business {type: "Restaurant"})
WHERE distance(nearby_business.location, restaurant.location) <= 500
RETURN DISTINCT restaurant.name, restaurant.type, distance(london.location, restaurant.location) AS distance_from_london
ORDER BY distance_from_london
This query demonstrates how Neo4j can efficiently navigate the graph and apply spatial filters. The nearby_business is found using the spatial index. Then, for each of those nearby_business nodes, Neo4j finds Restaurant nodes that are close to that specific nearby_business. The spatial index is used again for the second distance() check. The DISTINCT keyword ensures we don’t get duplicate restaurants if they are close to multiple nearby_business nodes.
The point data type in Neo4j supports different SRIDs (Spatial Reference System Identifiers). SRID 4326 is the most common, representing WGS 84, which is what GPS uses. If you’re working with data that uses a different projection, like a local UTM zone, you’ll need to ensure your point values use the correct SRID for accurate distance calculations. Neo4j’s distance function correctly handles these different projections, but mixing them without explicit transformation will lead to incorrect results.
When you query using withinDistance(node.location, point, radius), Neo4j uses a bounding box approximation first. It quickly checks if the bounding box of the indexed point intersects with the query’s bounding box. Only if the bounding box overlaps does it perform the more precise great-circle distance calculation. This two-step process is a key optimization: it dramatically reduces the number of expensive, precise calculations needed.
The most surprising thing about Neo4j’s spatial capabilities is how seamlessly it integrates geographic querying into the familiar graph traversal paradigm. You don’t have to switch to a separate geospatial database; you can mix and match graph relationships with spatial proximity in a single query, leveraging the strengths of both. This allows for incredibly powerful analyses, like finding connected components in a network that are also geographically clustered.
If you’re dealing with relationships that have a geographic component, like "delivery routes" or "customer service areas," you can model these directly. For example, a Route node could have a geometry property storing a LineString, and you could query for businesses that fall within the path of a specific route using the stWithin function from the apoc.spatial library, which works with WKT (Well-Known Text) representations of geometries.
The next step is exploring how to use apoc.spatial for more complex operations and different geometry types like polygons and lines.