Neo4j’s neo4j-admin import tool can ingest massive CSV datasets, but it’s not just about shoveling data in; it’s about telling Neo4j the shape of your graph before it even starts.

Let’s see it in action. Imagine you have two CSVs: users.csv and purchases.csv.

users.csv:

userId:ID(User),name,age
1,Alice,30
2,Bob,25
3,Charlie,35

purchases.csv:

:START_ID(User),:END_ID(Product),purchaseDate,:TYPE
1,101,2023-01-15,BOUGHT
1,102,2023-02-20,BOUGHT
2,101,2023-03-10,BOUGHT

And a products.csv (assuming product IDs are just nodes):

productId:ID(Product),productName
101,Laptop
102,Keyboard

To import this, you’d use a command like this, specifying the schema and output directory:

neo4j-admin import \
  --database neo4j \
  --nodes users=users.csv \
  --nodes products=products.csv \
  --relationships purchases=purchases.csv \
  --skip-bad-relationships \
  --output-directory /var/lib/neo4j/import_data

This command doesn’t just load data; it defines your graph structure. The --nodes and --relationships flags are where the magic happens. For nodes, you provide a label (e.g., users) and the path to the CSV. The first column must be an ID property with a specific type identifier like :ID(User). This tells Neo4j that this column uniquely identifies nodes of the User label. Similarly, for relationships, :START_ID(User) and :END_ID(Product) tell Neo4j which nodes these relationships connect, and the column after that (:TYPE) specifies the relationship type.

The core problem neo4j-admin import solves is the performance bottleneck of creating nodes and relationships one by one in a running database. Instead, it builds the graph data files (nodes, relationships, properties) offline, optimized for Neo4j’s storage engine. This means it can ingest billions of relationships in hours, not days or weeks. The key is that the schema definition in your CSVs, especially the :ID and :TYPE directives, is paramount. Without them, Neo4j doesn’t know how to correctly link your data into a graph.

The system works by reading your CSVs and writing out multiple intermediate files, one for each node label and relationship type. These files are then sorted and compressed into Neo4j’s internal graph data format. The --skip-bad-relationships flag is a lifesaver for large imports; it prevents the entire import from failing if a few relationships point to non-existent nodes. Instead, it logs those errors and continues.

The most surprising thing most people don’t realize is the --id-type flag. If you’re importing more than 4 billion nodes or relationships, you must specify --id-type=string. By default, Neo4j uses 64-bit integers for IDs. If your source IDs exceed this limit and you haven’t specified string, the import will fail with an obscure error about invalid ID values, even if your CSVs correctly use :ID(Label) syntax. This is because Neo4j needs to know at the file-creation stage whether to allocate space for integer IDs or string IDs, as the internal representation differs.

Once the import completes successfully, you’ll likely encounter errors related to index creation if you haven’t explicitly defined them beforehand or if the import process itself didn’t trigger their creation.

Want structured learning?

Take the full Neo4j course →