MLflow in CI/CD: Automate Model Training and Promotion
The most surprising thing about MLflow in CI/CD is that it doesn’t just log metrics; it becomes a core component of your deployment pipeline, acting as the single source of truth for model artifacts and their lineage.
Let’s see what that looks like. Imagine a GitHub Actions workflow triggered by a code push to your main branch.
name: CI/CD MLflow Pipeline
on:
push:
branches:
- main
jobs:
train_and_deploy:
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v3
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: '3.9'
- name: Install dependencies
run: pip install -r requirements.txt mlflow[extras]
- name: Train model and log to MLflow
env:
MLFLOW_TRACKING_URI: ${{ secrets.MLFLOW_TRACKING_URI }}
MLFLOW_EXPERIMENT_NAME: my-model-experiment
run: |
python train.py --data-path data/processed.csv --learning-rate 0.01 --epochs 50
In train.py, you’d have code like this:
import mlflow
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_squared_error
import argparse
def train():
parser = argparse.ArgumentParser()
parser.add_argument("--data-path", type=str, required=True)
parser.add_argument("--learning-rate", type=float, default=0.01)
parser.add_argument("--epochs", type=int, default=50)
args = parser.parse_args()
data = pd.read_csv(args.data_path)
X = data[['feature1', 'feature2']]
y = data['target']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Start an MLflow run
with mlflow.start_run(experiment_id=mlflow.set_experiment("my-model-experiment").experiment_id):
# Log parameters
mlflow.log_param("learning_rate", args.learning_rate)
mlflow.log_param("epochs", args.epochs)
# Train the model
model = RandomForestRegressor(n_estimators=100, random_state=42)
model.fit(X_train, y_train)
# Make predictions and evaluate
predictions = model.predict(X_test)
rmse = mean_squared_error(y_test, predictions, squared=False)
# Log metrics
mlflow.log_metric("rmse", rmse)
# Log the model
mlflow.sklearn.log_model(model, "random-forest-model")
print(f"Training complete. RMSE: {rmse}")
print(f"MLflow Run ID: {mlflow.active_run().info.run_id}")
if __name__ == "__main__":
train()
This workflow automatically checks out your code, sets up Python, installs dependencies, and then runs your train.py script. The key here is the MLFLOW_TRACKING_URI and MLFLOW_EXPERIMENT_NAME environment variables. MLFLOW_TRACKING_URI points to your MLflow server (e.g., http://localhost:5000 or a cloud-hosted MLflow instance), and MLFLOW_EXPERIMENT_NAME organizes your runs.
Your train.py script then uses mlflow.start_run() to create a new MLflow run. Inside this run, it logs hyperparameters (mlflow.log_param), evaluation metrics (mlflow.log_metric), and crucially, the trained model artifact (mlflow.sklearn.log_model). This means every time you push to main, a new, versioned model is trained and logged to your MLflow tracking server, along with all its associated metadata.
The power comes from how this integrates with promotion. After training, you might have another step or a manual process that reviews the latest run’s metrics in the MLflow UI. If the performance is satisfactory, you can then "promote" this run. MLflow supports this concept through "Tags" or "Aliases."
For instance, you could add a step that tags the best performing run:
- name: Tag best model
env:
MLFLOW_TRACKING_URI: ${{ secrets.MLFLOW_TRACKING_URI }}
run: |
BEST_RUN_ID=$(mlflow runs list --experiment-name "my-model-experiment" --order-by "metrics.rmse ASC" -n 1 -m metrics.rmse | grep "RUN ID" | awk '{print $3}')
mlflow runes set-tag --run-id $BEST_RUN_ID --key "stage" --value "production"
This script finds the run with the lowest RMSE in the "my-model-experiment" experiment and tags it with stage: production. Now, your deployment pipeline can query MLflow for the model artifact associated with the run tagged as production.
- name: Deploy production model
env:
MLFLOW_TRACKING_URI: ${{ secrets.MLFLOW_TRACKING_URI }}
run: |
PROD_RUN_ID=$(mlflow runs list --experiment-name "my-model-experiment" --tag "stage=production" -n 1 | grep "RUN ID" | awk '{print $3}')
if [ -z "$PROD_RUN_ID" ]; then
echo "No production model found."
exit 1
fi
echo "Deploying model from run ID: $PROD_RUN_ID"
# Example: Download and deploy the model artifact
mlflow artifacts download --run-id $PROD_RUN_ID --artifact-path "random-forest-model" --dst-path ./model_to_deploy
# Your deployment commands here...
echo "Model deployed successfully."
This allows for a robust, auditable, and automated way to manage your model lifecycle, moving from training to staging to production without manual intervention for model selection.
The thing most people don’t realize is that MLflow’s artifact store is not just for storing files; it’s a versioned, queryable repository. When you mlflow.sklearn.log_model, it’s not just dumping a pickled file. MLflow serializes the model, its dependencies (often inferred), and metadata into a structured format within the artifact store, making it reproducible and loadable into different environments. This is why you can later retrieve it using mlflow.sklearn.load_model or download it directly.
The next logical step is integrating model serving directly from MLflow, potentially using MLflow’s built-in model serving capabilities or connecting to platforms like Seldon Core or Kubeflow.