Docker inside Docker (DinD) on a Jenkins agent is a surprisingly fragile beast, and the common pitfall is that the Docker daemon inside the agent container can’t actually start because it thinks it’s already running on the host.

Here’s how to make it work and the common landmines to avoid:

The Core Problem: PID 1 and Daemonization

The Docker daemon is designed to run as a service, often as PID 1 on a system. When you run docker daemon inside another Docker container, it tries to do the same. However, the container environment has its own PID namespace. The Docker daemon inside the container sees that there’s already a PID 1 (the container’s init process or shell) and gets confused, refusing to start.

Common Causes and Fixes

  1. Incorrect docker command in entrypoint.sh / CMD:

    • Diagnosis: Examine the Dockerfile for your Jenkins agent image. Look at the ENTRYPOINT or CMD instruction. Often, it’s something like CMD ["dockerd"] or CMD ["/usr/local/bin/dockerd-entrypoint.sh"].
    • Fix: You need to run dockerd in the foreground, without daemonizing. The standard way is to add -H unix:///var/run/docker.sock to prevent it from trying to bind to a network port and ensure it listens on the socket that the Docker client inside the container will use.
      # Example Dockerfile snippet
      CMD ["dockerd", "-H", "unix:///var/run/docker.sock"]
      
      Or, if you have a custom entrypoint script:
      #!/bin/bash
      # dockerd-entrypoint.sh
      exec dockerd -H unix:///var/run/docker.sock "$@"
      
    • Why it works: This tells dockerd to run in the foreground and explicitly bind to the local Unix socket, which is how the docker CLI client inside the container communicates with the daemon. The container’s init process keeps dockerd running as its primary process, satisfying the "PID 1" expectation without conflict.
  2. Missing docker client or incorrect permissions on /var/run/docker.sock:

    • Diagnosis: After starting your DinD container, try running docker ps inside it. If you get "Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?", the daemon likely isn’t running (see point 1) or the client can’t reach it. If the daemon is running but you get permission denied, this is the issue.
    • Fix: Ensure the docker client is installed in your agent image and that the user Jenkins runs as (often jenkins or a specific user defined in the image) has read/write permissions on /var/run/docker.sock. The simplest way is to add the jenkins user to the docker group.
      # In your Jenkins agent Dockerfile
      RUN usermod -aG docker jenkins
      # Ensure dockerd is started *after* this user modification if applicable
      # and that the jenkins user has access to the socket.
      
      If you’re mounting the socket from the host, ensure the host user running the Docker daemon has the correct group ownership for the socket.
    • Why it works: The docker client needs to communicate with the daemon via the Unix socket. The docker group typically owns this socket on the host, and adding the jenkins user to this group grants the necessary permissions to interact with the daemon.
  3. Insufficient Privileges for the Container:

    • Diagnosis: The Docker daemon often needs elevated privileges to manipulate network interfaces, mount volumes, and manage containers. If your Jenkins agent container is run with default, unprivileged settings, dockerd might fail to start or fail to perform its operations.
    • Fix: Run the Jenkins agent container with --privileged. This grants the container almost all the capabilities of the host.
      docker run -d --privileged -v /var/run/docker.sock:/var/run/docker.sock <your-jenkins-agent-image>
      
      For Jenkins, this typically means configuring your agent definition in Jenkinsfile or Jenkins.groovy to include this:
      agent {
          docker {
              image 'your-dind-agent-image:latest'
              args '--privileged' // Or use customWorkspace
          }
      }
      
      Or, if using dockerTemplate in a declarative pipeline:
      pipeline {
          agent {
              docker {
                  image 'your-dind-agent-image:latest'
                  args '--privileged'
              }
          }
          stages { ... }
      }
      
    • Why it works: The --privileged flag essentially disables most security restrictions for the container, allowing the Docker daemon inside to perform low-level operations on the host’s kernel, such as creating network interfaces and mounting filesystems, which are essential for managing other containers.
  4. Host Docker Daemon Not Accessible (Volume Mount Issue):

    • Diagnosis: If you’re trying to use the host’s Docker daemon from within the agent container (a common pattern to avoid DinD complexities), and commands like docker ps fail with "Cannot connect to the Docker daemon at unix:///var/run/docker.sock," it’s likely the volume mount is incorrect or the socket isn’t shared.
    • Fix: Ensure you are correctly mounting the host’s Docker socket into the agent container.
      docker run -d -v /var/run/docker.sock:/var/run/docker.sock <your-jenkins-agent-image>
      
      In your Jenkins Jenkinsfile:
      agent {
          docker {
              image 'your-jenkins-agent-image:latest'
              args '-v /var/run/docker.sock:/var/run/docker.sock'
          }
      }
      
      This approach is often preferred over DinD as it’s simpler and more robust.
    • Why it works: This directly exposes the host’s running Docker daemon to the agent container. The container’s Docker client then communicates with the host daemon, effectively allowing Jenkins jobs to orchestrate containers on the host.
  5. Resource Constraints (CPU/Memory):

    • Diagnosis: The Docker daemon itself is a resource-intensive process. If the Jenkins agent container or the host machine is starved for CPU or memory, dockerd might fail to start or become unresponsive.
    • Fix: Increase the CPU and memory limits allocated to the Jenkins agent container, or ensure the host machine has sufficient resources.
      docker run -d --privileged --cpus 2 --memory 2g <your-jenkins-agent-image>
      
      For Jenkins agents, this might involve configuring resource limits on the Docker daemon that launches the agents, or specifying limits within the agent definition if your orchestrator supports it.
    • Why it works: dockerd requires a certain amount of system resources to initialize and manage its internal state and worker threads. Providing adequate CPU and RAM ensures it can start and operate without being prematurely terminated or becoming stuck.
  6. Corrupted Docker Socket or Daemon State:

    • Diagnosis: In rare cases, the Docker daemon’s state on the host or the socket file itself might become corrupted, leading to connection issues even with correct configuration.
    • Fix: Restart the Docker daemon on the host machine.
      sudo systemctl restart docker
      # or
      sudo service docker restart
      
      If the issue persists, you might need to investigate Docker’s data directory (/var/lib/docker on Linux) for corruption, though this is an extreme measure.
    • Why it works: Restarting the Docker daemon cleans up any stale processes, resets its internal state, and recreates the necessary socket files, resolving transient corruption issues.

After successfully configuring DinD, the next hurdle you’ll likely face is handling the lifecycle of the Docker daemon within the agent container, ensuring it’s stopped cleanly when the agent is terminated to avoid orphaned processes or resource leaks.

Want structured learning?

Take the full Jenkins course →