Jenkins high availability is usually achieved by running multiple Jenkins masters, each with its own set of agents, behind a load balancer.
Here’s how you can set this up in an Active-Active configuration:
Let’s say we have two Jenkins masters, jenkins-master-1 and jenkins-master-2. They will both be running the same Jenkins instance, configured identically, and pointing to the same shared storage for their home directories.
The Goal: Seamless Failover and Increased Throughput
The primary goal of an active-active setup is to ensure that if one Jenkins master goes down, the other can seamlessly take over, minimizing downtime. Additionally, by distributing the load across multiple masters, you can handle more concurrent builds and improve overall performance.
Core Components
- Multiple Jenkins Masters: You’ll need at least two separate Jenkins server instances.
- Shared Storage: All Jenkins masters must read and write to the exact same Jenkins home directory (
$JENKINS_HOME). This is the most critical piece. - Load Balancer: A load balancer sits in front of your Jenkins masters, distributing incoming requests and directing traffic to the available masters.
- Agent Management: How agents connect to the masters needs careful consideration.
Setting Up Shared Storage
This is where most people stumble. Jenkins masters are not designed to run concurrently against the same filesystem unless that filesystem is specifically designed for this kind of concurrent access and is configured correctly.
Option 1: Network File System (NFS)
- Diagnosis: If you try to start two Jenkins masters pointing to the same NFS mount without proper locking, you’ll likely see file corruption or one master overwriting the other’s state.
- Configuration:
- Mount the NFS share on both Jenkins master servers. For example, on
jenkins-master-1andjenkins-master-2:sudo mount -t nfs nfs-server-ip:/path/to/jenkins_home /var/lib/jenkins - Crucially, use NFSv4, which has better locking mechanisms.
- Configure NFS server exports with
sync,no_subtree_checkand ensure appropriate user/group IDs are consistent across servers.
- Mount the NFS share on both Jenkins master servers. For example, on
- Why it works: NFSv4 provides file locking, which is essential for preventing race conditions where both Jenkins masters try to write to the same file simultaneously.
Option 2: Distributed File System (e.g., GlusterFS, CephFS)
- Diagnosis: Similar to NFS, direct concurrent access to non-distributed storage will lead to data corruption.
- Configuration:
- Set up a GlusterFS or CephFS cluster.
- Create a distributed volume for
$JENKINS_HOME. - Mount this volume on both Jenkins master nodes.
# Example for GlusterFS sudo mount -t glusterfs nfs-server-ip:/jenkins_volume /var/lib/jenkins
- Why it works: These systems are designed for concurrent access and provide robust data consistency and fault tolerance across multiple nodes, inherently handling the locking and replication needed.
Option 3: Cloud Object Storage with a FUSE adapter (less common for primary $JENKINS_HOME, more for artifacts)
- Diagnosis: Direct object storage access isn’t suitable for the transactional nature of
$JENKINS_HOME. - Configuration: You’d typically use a FUSE (Filesystem in Userspace) driver that presents object storage (like S3) as a local filesystem. However, this is generally not recommended for the core
$JENKINS_HOMEdue to latency and consistency issues for frequent small writes. It’s better suited for build artifacts. - Why it works (conceptually, with caveats): The FUSE driver translates filesystem operations into object storage API calls. However, the consistency model of object storage (eventual consistency) is problematic for the immediate needs of a running Jenkins instance.
Load Balancer Configuration
- Diagnosis: Without a load balancer, requests only go to one master, defeating the purpose of HA. If a master fails, users can’t access Jenkins.
- Configuration:
- Use a tool like HAProxy, Nginx, or a cloud provider’s load balancer (e.g., AWS ELB, GCP Load Balancer).
- Configure it to listen on port 80/443 and forward traffic to your Jenkins masters on their respective HTTP ports (e.g., 8080).
- Health Checks: This is vital. The load balancer must actively check if the Jenkins masters are responsive.
- HAProxy Example:
listen jenkins bind *:80 mode http balance roundrobin option httpchk GET /login server jenkins-master-1 192.168.1.10:8080 check server jenkins-master-2 192.168.1.11:8080 check - Nginx Example:
http { upstream jenkins_backend { server 192.168.1.10:8080; server 192.168.1.11:8080; } server { listen 80; location / { proxy_pass http://jenkins_backend; proxy_set_header Host $host; proxy_set_header X-Real-IP $remote_addr; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; proxy_set_header X-Forwarded-Proto $scheme; } } } - Health Check Endpoint: Jenkins exposes
/loginwhich returns a 200 OK if it’s up and running. This is a good endpoint for load balancer health checks.
- HAProxy Example:
- Why it works: The load balancer routes traffic only to healthy Jenkins instances. If one instance fails its health check, the load balancer stops sending traffic to it, ensuring users are directed to the available master.
Agent Management
- Diagnosis: Agents need to be able to connect to any available Jenkins master, or at least be re-assigned gracefully. If agents are statically configured to connect to a specific master, they will fail if that master goes down.
- Configuration:
- Dynamic Agent Provisioning: Use tools like Kubernetes, EC2 plugin, or Docker Swarm. These systems allow agents to be spawned dynamically and often register themselves with the available Jenkins masters.
- JNLP Agents: If using JNLP agents, configure them with the load balancer’s address, not a specific master’s IP. The agent will connect to the load balancer, which will then forward it to an available master.
# In agent's launch script or config java -jar agent.jar -jnlpUrl http://your-jenkins-lb.com/computer/your-agent-name/jenkins-agent.jnlp - SSH Agents: Configure SSH agents to connect to the load balancer’s address (if the load balancer can handle SSH, which is less common for standard HTTP load balancers). More typically, SSH agents would be configured to use a DNS record that points to the load balancer.
- Why it works: By connecting agents through the load balancer or using dynamic provisioning, agents can automatically find and connect to a healthy Jenkins master, ensuring build jobs can continue even if one master is unavailable.
Important Considerations
$JENKINS_HOMEConsistency: This cannot be stressed enough. Every file, every configuration change, every plugin update must be visible and identical to both masters.- Plugin Compatibility: Ensure all plugins are compatible with clustered environments. Some older plugins might not handle this well.
- State Management: Jobs, build history, user data – all reside in
$JENKINS_HOME. If this isn’t shared and consistent, your cluster will fail. - Security: Ensure your load balancer is configured for SSL/TLS termination if you’re using HTTPS.
- Agent Affinity: If you have agents that must run on a specific master (e.g., for hardware reasons), you’ll need a more complex setup, potentially involving sticky sessions on the load balancer or custom agent assignment logic. However, for true HA, agents should ideally be treated as fungible.
The next hurdle you’ll likely encounter is ensuring that your build artifacts are also consistently stored and accessible, or dealing with plugin updates across multiple masters.