Neo4j on Kubernetes, when deployed via Helm, isn’t just a database in a container; it’s a distributed, fault-tolerant graph database system designed to scale and self-heal.
Let’s see it in action. Imagine you have a basic Helm chart for Neo4j. You’ve customized values.yaml to specify your desired replica count, storage class, and Neo4j version.
# values.yaml example snippet
replicaCount: 3
neo4j:
version: "4.4.2"
password: "my-secure-password"
plugins:
- name: "graph-data-science"
version: "1.7.1"
resources:
requests:
cpu: "500m"
memory: "1Gi"
limits:
cpu: "1"
memory: "2Gi"
persistence:
enabled: true
storageClass: "standard"
size: "10Gi"
When you run helm install my-neo4j neo4j/neo4j --values values.yaml, Helm orchestrates the creation of Kubernetes resources. It doesn’t just deploy a single Neo4j pod; it creates a StatefulSet. This StatefulSet ensures each Neo4j instance gets a stable network identity and persistent storage, crucial for maintaining cluster state and data integrity. For a clustered deployment (like you’d want for high availability), Helm configures the neo4j.cluster.discovery.mode to k8s and sets up headless services for peer discovery. The neo4j.cluster.initial-members might be automatically managed or you might set it explicitly for bootstrapping.
The core problem this solves is managing the complexity of a distributed database in a dynamic, ephemeral environment like Kubernetes. Without Helm, you’d be manually crafting StatefulSets, Services, PersistentVolumeClaims, and potentially ConfigMaps for Neo4j configuration, all while ensuring they correctly interact for clustering. Helm abstracts this away into a configurable chart.
Internally, when you configure Neo4j for clustering (e.g., neo4j.cluster.discovery.mode: k8s), the Neo4j instances within the Kubernetes pods communicate with each other using the Kubernetes API. They discover peers by querying the headless service associated with the StatefulSet. Each Neo4j pod registers itself, and others find it. The initial-members configuration helps bootstrap this process, ensuring a quorum is formed. The neo4j.cluster.raft.election-timeout and neo4j.cluster.raft.heartbeat-interval are parameters that control how the Raft consensus algorithm operates to maintain consistency across the cluster, and Helm makes them easily adjustable.
The exact levers you control are primarily through the values.yaml file. You can configure:
replicaCount: For simple deployments, this is just the number of pods. For clustering, it’s the desired number of nodes in your cluster.neo4j.version: Specifies the exact Neo4j Community or Enterprise Edition image to use.neo4j.password: The initial password for theneo4juser. For production, this should be managed via Kubernetes Secrets.neo4j.plugins: To add essential plugins like the Graph Data Science library.persistence.enabled,persistence.storageClass,persistence.size: To ensure your data survives pod restarts and rescheduling.neo4j.resources: CPU and memory requests/limits for your Neo4j pods, critical for performance and stability.neo4j.cluster.discovery.mode: Set tok8sfor automatic discovery within Kubernetes.neo4j.cluster.initial-members: The list of initial members for bootstrapping a cluster.neo4j.cluster.raft.*: Fine-tuning Raft consensus parameters.
A common configuration detail often overlooked is how Neo4j Enterprise Edition handles licensing. While the Helm chart deploys the software, you’ll need to ensure your Enterprise Edition license is correctly applied, typically by mounting a license file as a Kubernetes Secret and referencing it in the Neo4j configuration. The chart might have a specific parameter for this, or you might need to inject it via custom configuration.
Once you have a cluster up and running, the next logical step is to integrate it with your applications, often via a Kubernetes Service of type LoadBalancer or NodePort for external access, or an internal ClusterIP Service for in-cluster consumption.