A surprising number of secrets are found in GitHub repos, and they’re often not where you’d expect them to be.

Let’s say you have a GitHub repository for your company’s web application. Inside, you might have a Dockerfile that builds your application image. It could look something like this:

FROM ubuntu:latest

RUN apt-get update && apt-get install -y \
    python3 \
    python3-pip \
    && rm -rf /var/lib/apt/lists/*

COPY requirements.txt /app/
RUN pip3 install --no-cache-dir -r /app/requirements.txt

COPY . /app/
WORKDIR /app

# This is where the problem might be...
RUN echo "MY_API_KEY=sk_live_xxxxxxxxxxxxxxxxxxxxxxxxxxxx" >> /app/.env

CMD ["python3", "main.py"]

In this Dockerfile, a hardcoded API key for a live service is being injected directly into the image build process. If this repository is public, or if an unauthorized person gains access to it, that API key is now compromised. This could lead to unauthorized usage of the service, potentially incurring significant costs or data breaches.

The fundamental problem this solves is the accidental exposure of sensitive credentials—API keys, passwords, private certificates, database connection strings, etc.—that are embedded within source code. This exposure can happen through various means: a developer forgetting to add a file to .gitignore, a misconfiguration in a CI/CD pipeline, or simply a lack of awareness about where secrets might end up.

Internally, secrets detection tools work by scanning your codebase, often line by line, looking for patterns that match known secret formats. These patterns are typically implemented as regular expressions, but sophisticated tools also use entropy analysis to detect strings that are too random to be normal code or configuration. For example, a typical API key might look like sk_live_[a-zA-Z0-9]{48}. A tool would search for this pattern.

The levers you control are primarily the configuration of the scanning tool itself. This includes:

  • The scope of the scan: Which branches, directories, or files to include or exclude.
  • The types of secrets to detect: You can often enable or disable specific categories of secrets (e.g., AWS keys, GitHub tokens, SSH private keys).
  • The severity of findings: Some tools allow you to define what constitutes a high, medium, or low severity issue.
  • Integration points: How the scanner integrates with your Git hosting platform (like GitHub), your CI/CD pipeline, or your local development environment.

Let’s look at a practical example of how a tool like git-secrets might be used. First, you’d install it, often via a package manager or by cloning the repository. Then, you can install it as a Git hook:

git secrets --install ~/.git-hooks

This sets up a pre-commit hook, meaning that before any commit is made, git-secrets will scan the staged changes for potential secrets. If it finds something, it will prevent the commit and show you the offending line:

Pre-commit hook detected secrets in the following files:
Dockerfile:10: MY_API_KEY=sk_live_xxxxxxxxxxxxxxxxxxxxxxxxxxxx

You can then choose to remove the secret before committing, or if it’s a false positive, you can use git secrets --register-allow to whitelist that specific line for future commits.

A more advanced scenario involves integrating secrets scanning into your CI/CD pipeline. For instance, in a GitHub Actions workflow, you might use a dedicated action:

name: CI

on: [push]

jobs:
  build:
    runs-on: ubuntu-latest
    steps:
    - uses: actions/checkout@v3
    - name: Run secrets detection
      uses: zricethezav/gitleaks-action@v1
      env:

        GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

This action, gitleaks-action, will scan the entire repository on every push. If it finds any secrets, the workflow will fail, preventing potentially compromised code from being merged or deployed. The configuration for gitleaks can be further customized via a .gitleaks.toml file in the repository root, allowing fine-grained control over rules and exclusions.

Many developers don’t realize that simply adding a secret to a .gitignore file after it has already been committed is insufficient. Git tracks history, and the secret will still be present in previous commits. To truly remove a secret from your repository’s history, you need to use tools like git filter-repo or bfg-repo-cleaner to rewrite the commit history. This is a destructive operation and should be done with extreme caution, especially on shared repositories, as it invalidates existing clones and branches.

The next challenge is managing secrets effectively, rather than just detecting their leakage.

Want structured learning?

Take the full Github course →