The Neo4j APOC library is a collection of user-defined procedures and functions that extend Cypher, Neo4j’s graph query language, with a vast array of powerful, often missing, utility operations.

Let’s see APOC in action. Imagine you have a graph representing social connections, and you want to find all friends of friends, but only those who are not direct friends. A basic Cypher query might look like this:

MATCH (p:Person {name: "Alice"})-[:FRIENDS_WITH]->(friend)-[:FRIENDS_WITH]->(foaf)
WHERE NOT (p)-[:FRIENDS_WITH]->(foaf)
RETURN DISTINCT foaf.name AS friendOfFriend

This works, but what if you need to traverse a specific number of hops, or perform complex path analysis? This is where APOC shines. APOC provides procedures like apoc.path.expandConfig which allows for much more intricate pathfinding.

Here’s how you might use apoc.path.expandConfig to find friends of friends of Alice, but limited to a maximum of 2 hops and excluding direct friends:

CALL apoc.path.expandConfig(
  '(p:Person {name: "Alice"})',
  {
    relationshipFilter: 'FRIENDS_WITH>',
    minLevel: 2,
    maxLevel: 2,
    filterStatement: "return node.name <> 'Alice'"
  }
)
YIELD node
RETURN DISTINCT node.name AS friendOfFriend

This query achieves the same result as the pure Cypher example but demonstrates the power and flexibility APOC offers. The relationshipFilter specifies the direction and type of relationships to traverse, minLevel and maxLevel control the path length, and filterStatement allows for node-level filtering during the traversal.

APOC solves a fundamental problem: the inherent limitations of a declarative query language like Cypher when it comes to procedural, iterative, or complex data manipulation tasks. While Cypher is excellent for describing graph patterns, performing bulk data transformations, interacting with external systems, or executing intricate algorithms often requires procedural logic. APOC bridges this gap by providing pre-built, optimized procedures that can be called directly from Cypher.

Internally, APOC procedures are written in Java and compiled into Neo4j plugins. They leverage Neo4j’s internal APIs to interact with the graph database efficiently. When you call an APOC procedure, Neo4j loads and executes the corresponding Java code, passing parameters from your Cypher query and returning results back into the Cypher execution pipeline. This allows you to seamlessly blend declarative pattern matching with procedural execution.

The apoc.coll module, for instance, offers a treasure trove of collection manipulation functions. You can split strings into lists, merge lists, filter lists based on conditions, and much more. Consider a scenario where you have a list of IDs in a property and need to find all nodes matching those IDs:

WITH ["id1", "id2", "id3"] AS listOfIds
CALL apoc.coll.indexOf(listOfIds, 'id2') YIELD value
RETURN value

This would return 1 (the index of 'id2' in the list). You can use this with apoc.coll.toList and other functions to build complex data transformations.

A key aspect many users overlook is the performance implications of certain APOC procedures. While APOC provides immense utility, calling procedures that perform extensive traversals or data manipulation within a single Cypher query can still be resource-intensive. For example, using apoc.graph.fromData to construct a subgraph and then querying it can be powerful, but if the input data is massive, it might become a bottleneck. Always consider the scale of your data and the specific APOC procedure you’re using; sometimes, breaking down a complex operation into multiple, smaller Cypher statements or optimizing the APOC procedure’s parameters can significantly improve performance.

The next step in mastering APOC is to explore its capabilities for data import and export, such as apoc.load.json and apoc.export.cypher.

Want structured learning?

Take the full Neo4j course →