Docker inside Docker (DinD) on a Jenkins agent is a surprisingly fragile beast, and the common pitfall is that the Docker daemon inside the agent container can’t actually start because it thinks it’s already running on the host.
Here’s how to make it work and the common landmines to avoid:
The Core Problem: PID 1 and Daemonization
The Docker daemon is designed to run as a service, often as PID 1 on a system. When you run docker daemon inside another Docker container, it tries to do the same. However, the container environment has its own PID namespace. The Docker daemon inside the container sees that there’s already a PID 1 (the container’s init process or shell) and gets confused, refusing to start.
Common Causes and Fixes
-
Incorrect
dockercommand inentrypoint.sh/CMD:- Diagnosis: Examine the
Dockerfilefor your Jenkins agent image. Look at theENTRYPOINTorCMDinstruction. Often, it’s something likeCMD ["dockerd"]orCMD ["/usr/local/bin/dockerd-entrypoint.sh"]. - Fix: You need to run
dockerdin the foreground, without daemonizing. The standard way is to add-H unix:///var/run/docker.sockto prevent it from trying to bind to a network port and ensure it listens on the socket that the Docker client inside the container will use.
Or, if you have a custom entrypoint script:# Example Dockerfile snippet CMD ["dockerd", "-H", "unix:///var/run/docker.sock"]#!/bin/bash # dockerd-entrypoint.sh exec dockerd -H unix:///var/run/docker.sock "$@" - Why it works: This tells
dockerdto run in the foreground and explicitly bind to the local Unix socket, which is how thedockerCLI client inside the container communicates with the daemon. The container’sinitprocess keepsdockerdrunning as its primary process, satisfying the "PID 1" expectation without conflict.
- Diagnosis: Examine the
-
Missing
dockerclient or incorrect permissions on/var/run/docker.sock:- Diagnosis: After starting your DinD container, try running
docker psinside it. If you get "Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?", the daemon likely isn’t running (see point 1) or the client can’t reach it. If the daemon is running but you get permission denied, this is the issue. - Fix: Ensure the
dockerclient is installed in your agent image and that the user Jenkins runs as (oftenjenkinsor a specific user defined in the image) has read/write permissions on/var/run/docker.sock. The simplest way is to add thejenkinsuser to thedockergroup.
If you’re mounting the socket from the host, ensure the host user running the Docker daemon has the correct group ownership for the socket.# In your Jenkins agent Dockerfile RUN usermod -aG docker jenkins # Ensure dockerd is started *after* this user modification if applicable # and that the jenkins user has access to the socket. - Why it works: The
dockerclient needs to communicate with the daemon via the Unix socket. Thedockergroup typically owns this socket on the host, and adding thejenkinsuser to this group grants the necessary permissions to interact with the daemon.
- Diagnosis: After starting your DinD container, try running
-
Insufficient Privileges for the Container:
- Diagnosis: The Docker daemon often needs elevated privileges to manipulate network interfaces, mount volumes, and manage containers. If your Jenkins agent container is run with default, unprivileged settings,
dockerdmight fail to start or fail to perform its operations. - Fix: Run the Jenkins agent container with
--privileged. This grants the container almost all the capabilities of the host.
For Jenkins, this typically means configuring your agent definition indocker run -d --privileged -v /var/run/docker.sock:/var/run/docker.sock <your-jenkins-agent-image>JenkinsfileorJenkins.groovyto include this:
Or, if usingagent { docker { image 'your-dind-agent-image:latest' args '--privileged' // Or use customWorkspace } }dockerTemplatein a declarative pipeline:pipeline { agent { docker { image 'your-dind-agent-image:latest' args '--privileged' } } stages { ... } } - Why it works: The
--privilegedflag essentially disables most security restrictions for the container, allowing the Docker daemon inside to perform low-level operations on the host’s kernel, such as creating network interfaces and mounting filesystems, which are essential for managing other containers.
- Diagnosis: The Docker daemon often needs elevated privileges to manipulate network interfaces, mount volumes, and manage containers. If your Jenkins agent container is run with default, unprivileged settings,
-
Host Docker Daemon Not Accessible (Volume Mount Issue):
- Diagnosis: If you’re trying to use the host’s Docker daemon from within the agent container (a common pattern to avoid DinD complexities), and commands like
docker psfail with "Cannot connect to the Docker daemon at unix:///var/run/docker.sock," it’s likely the volume mount is incorrect or the socket isn’t shared. - Fix: Ensure you are correctly mounting the host’s Docker socket into the agent container.
In your Jenkinsdocker run -d -v /var/run/docker.sock:/var/run/docker.sock <your-jenkins-agent-image>Jenkinsfile:
This approach is often preferred over DinD as it’s simpler and more robust.agent { docker { image 'your-jenkins-agent-image:latest' args '-v /var/run/docker.sock:/var/run/docker.sock' } } - Why it works: This directly exposes the host’s running Docker daemon to the agent container. The container’s Docker client then communicates with the host daemon, effectively allowing Jenkins jobs to orchestrate containers on the host.
- Diagnosis: If you’re trying to use the host’s Docker daemon from within the agent container (a common pattern to avoid DinD complexities), and commands like
-
Resource Constraints (CPU/Memory):
- Diagnosis: The Docker daemon itself is a resource-intensive process. If the Jenkins agent container or the host machine is starved for CPU or memory,
dockerdmight fail to start or become unresponsive. - Fix: Increase the CPU and memory limits allocated to the Jenkins agent container, or ensure the host machine has sufficient resources.
For Jenkins agents, this might involve configuring resource limits on the Docker daemon that launches the agents, or specifying limits within the agent definition if your orchestrator supports it.docker run -d --privileged --cpus 2 --memory 2g <your-jenkins-agent-image> - Why it works:
dockerdrequires a certain amount of system resources to initialize and manage its internal state and worker threads. Providing adequate CPU and RAM ensures it can start and operate without being prematurely terminated or becoming stuck.
- Diagnosis: The Docker daemon itself is a resource-intensive process. If the Jenkins agent container or the host machine is starved for CPU or memory,
-
Corrupted Docker Socket or Daemon State:
- Diagnosis: In rare cases, the Docker daemon’s state on the host or the socket file itself might become corrupted, leading to connection issues even with correct configuration.
- Fix: Restart the Docker daemon on the host machine.
If the issue persists, you might need to investigate Docker’s data directory (sudo systemctl restart docker # or sudo service docker restart/var/lib/dockeron Linux) for corruption, though this is an extreme measure. - Why it works: Restarting the Docker daemon cleans up any stale processes, resets its internal state, and recreates the necessary socket files, resolving transient corruption issues.
After successfully configuring DinD, the next hurdle you’ll likely face is handling the lifecycle of the Docker daemon within the agent container, ensuring it’s stopped cleanly when the agent is terminated to avoid orphaned processes or resource leaks.