MLflow in CI/CD: Automate Model Training and Promotion

The most surprising thing about MLflow in CI/CD is that it doesn’t just log metrics; it becomes a core component of your deployment pipeline, acting as the single source of truth for model artifacts and their lineage.

Let’s see what that looks like. Imagine a GitHub Actions workflow triggered by a code push to your main branch.

name: CI/CD MLflow Pipeline

on:
  push:
    branches:
      - main

jobs:
  train_and_deploy:
    runs-on: ubuntu-latest
    steps:
      - name: Checkout code
        uses: actions/checkout@v3

      - name: Set up Python
        uses: actions/setup-python@v4
        with:
          python-version: '3.9'

      - name: Install dependencies
        run: pip install -r requirements.txt mlflow[extras]

      - name: Train model and log to MLflow
        env:

          MLFLOW_TRACKING_URI: ${{ secrets.MLFLOW_TRACKING_URI }}

          MLFLOW_EXPERIMENT_NAME: my-model-experiment
        run: |
          python train.py --data-path data/processed.csv --learning-rate 0.01 --epochs 50

In train.py, you’d have code like this:

import mlflow
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_squared_error
import argparse

def train():
    parser = argparse.ArgumentParser()
    parser.add_argument("--data-path", type=str, required=True)
    parser.add_argument("--learning-rate", type=float, default=0.01)
    parser.add_argument("--epochs", type=int, default=50)
    args = parser.parse_args()

    data = pd.read_csv(args.data_path)
    X = data[['feature1', 'feature2']]
    y = data['target']
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

    # Start an MLflow run
    with mlflow.start_run(experiment_id=mlflow.set_experiment("my-model-experiment").experiment_id):
        # Log parameters
        mlflow.log_param("learning_rate", args.learning_rate)
        mlflow.log_param("epochs", args.epochs)

        # Train the model
        model = RandomForestRegressor(n_estimators=100, random_state=42)
        model.fit(X_train, y_train)

        # Make predictions and evaluate
        predictions = model.predict(X_test)
        rmse = mean_squared_error(y_test, predictions, squared=False)

        # Log metrics
        mlflow.log_metric("rmse", rmse)

        # Log the model
        mlflow.sklearn.log_model(model, "random-forest-model")

        print(f"Training complete. RMSE: {rmse}")
        print(f"MLflow Run ID: {mlflow.active_run().info.run_id}")

if __name__ == "__main__":
    train()

This workflow automatically checks out your code, sets up Python, installs dependencies, and then runs your train.py script. The key here is the MLFLOW_TRACKING_URI and MLFLOW_EXPERIMENT_NAME environment variables. MLFLOW_TRACKING_URI points to your MLflow server (e.g., http://localhost:5000 or a cloud-hosted MLflow instance), and MLFLOW_EXPERIMENT_NAME organizes your runs.

Your train.py script then uses mlflow.start_run() to create a new MLflow run. Inside this run, it logs hyperparameters (mlflow.log_param), evaluation metrics (mlflow.log_metric), and crucially, the trained model artifact (mlflow.sklearn.log_model). This means every time you push to main, a new, versioned model is trained and logged to your MLflow tracking server, along with all its associated metadata.

The power comes from how this integrates with promotion. After training, you might have another step or a manual process that reviews the latest run’s metrics in the MLflow UI. If the performance is satisfactory, you can then "promote" this run. MLflow supports this concept through "Tags" or "Aliases."

For instance, you could add a step that tags the best performing run:

      - name: Tag best model
        env:

          MLFLOW_TRACKING_URI: ${{ secrets.MLFLOW_TRACKING_URI }}

        run: |
          BEST_RUN_ID=$(mlflow runs list --experiment-name "my-model-experiment" --order-by "metrics.rmse ASC" -n 1 -m metrics.rmse | grep "RUN ID" | awk '{print $3}')
          mlflow runes set-tag --run-id $BEST_RUN_ID --key "stage" --value "production"

This script finds the run with the lowest RMSE in the "my-model-experiment" experiment and tags it with stage: production. Now, your deployment pipeline can query MLflow for the model artifact associated with the run tagged as production.

      - name: Deploy production model
        env:

          MLFLOW_TRACKING_URI: ${{ secrets.MLFLOW_TRACKING_URI }}

        run: |
          PROD_RUN_ID=$(mlflow runs list --experiment-name "my-model-experiment" --tag "stage=production" -n 1 | grep "RUN ID" | awk '{print $3}')
          if [ -z "$PROD_RUN_ID" ]; then
            echo "No production model found."
            exit 1
          fi
          echo "Deploying model from run ID: $PROD_RUN_ID"
          # Example: Download and deploy the model artifact
          mlflow artifacts download --run-id $PROD_RUN_ID --artifact-path "random-forest-model" --dst-path ./model_to_deploy
          # Your deployment commands here...
          echo "Model deployed successfully."

This allows for a robust, auditable, and automated way to manage your model lifecycle, moving from training to staging to production without manual intervention for model selection.

The thing most people don’t realize is that MLflow’s artifact store is not just for storing files; it’s a versioned, queryable repository. When you mlflow.sklearn.log_model, it’s not just dumping a pickled file. MLflow serializes the model, its dependencies (often inferred), and metadata into a structured format within the artifact store, making it reproducible and loadable into different environments. This is why you can later retrieve it using mlflow.sklearn.load_model or download it directly.

The next logical step is integrating model serving directly from MLflow, potentially using MLflow’s built-in model serving capabilities or connecting to platforms like Seldon Core or Kubeflow.

Want structured learning?

Take the full Mlflow course →