MLflow Tracking can be configured to log to Azure Blob Storage, but the default behavior is to log locally, which is often not what you want when running experiments at scale.
Here’s how to get your MLflow runs to reliably log to Azure Blob Storage, so you can share results and artifacts across teams and environments.
The Problem: Local Logging by Default
When you first start using MLflow, it creates a mlruns directory in your current working directory. All your experiment runs, parameters, metrics, and artifacts get saved there. This is great for getting started, but it quickly becomes a bottleneck. If you’re running experiments on a different machine, or if you want to share results, that local mlruns directory isn’t accessible. Azure ML provides a robust solution for this by allowing you to centralize your MLflow tracking data in Azure Blob Storage.
The Solution: Pointing MLflow to Azure Blob Storage
To achieve this, you need to configure MLflow’s tracking URI to point to your Azure Blob Storage. This involves setting a couple of environment variables or passing them directly when initializing the MLflow client.
1. Set Up Azure Blob Storage:
First, you need an Azure Storage Account and a container within it.
- Storage Account: Create a Storage Account in the Azure portal. Note down its Name and Access Key.
- Container: Inside your Storage Account, create a Blob container (e.g.,
mlflow-runs).
2. Configure MLflow Tracking URI:
MLflow uses the MLFLOW_TRACKING_URI environment variable. For Azure Blob Storage, the format is:
azure://<storage_account_name>.blob.core.windows.net/<container_name>
You’ll also need to provide authentication. The most common and secure way is using the storage account’s access key. Set this in the AZURE_STORAGE_ACCOUNT_KEY environment variable.
Example using environment variables (Bash):
export AZURE_STORAGE_ACCOUNT_NAME="your_storage_account_name"
export AZURE_STORAGE_ACCOUNT_KEY="your_storage_account_access_key"
export MLFLOW_TRACKING_URI="azure://${AZURE_STORAGE_ACCOUNT_NAME}.blob.core.windows.net/mlflow-runs"
Replace "your_storage_account_name" and "your_storage_account_access_key" with your actual credentials.
Example using Python (when initializing MLflow client):
import os
import mlflow
os.environ["AZURE_STORAGE_ACCOUNT_NAME"] = "your_storage_account_name"
os.environ["AZURE_STORAGE_ACCOUNT_KEY"] = "your_storage_account_access_key"
tracking_uri = "azure://your_storage_account_name.blob.core.windows.net/mlflow-runs"
mlflow.set_tracking_uri(tracking_uri)
# Now any MLflow calls will log to Azure Blob Storage
# For example:
with mlflow.start_run():
mlflow.log_param("learning_rate", 0.01)
mlflow.log_metric("accuracy", 0.85)
3. Common Causes for Failure and How to Fix Them:
-
Incorrect Storage Account Name or Access Key:
- Diagnosis: Check the
AZURE_STORAGE_ACCOUNT_NAMEandAZURE_STORAGE_ACCOUNT_KEYenvironment variables. Ensure they are correctly copied from your Azure portal. Try listing blobs in the container using Azure CLI (az storage blob list --account-name <name> --account-key <key> --container-name <container>) to verify credentials. - Fix: Update the environment variables with the exact name and key.
- Why it works: MLflow uses these credentials to authenticate with Azure Blob Storage and write/read data. If they are wrong, the connection fails.
- Diagnosis: Check the
-
Container Not Found or Typo:
- Diagnosis: Verify the container name in the
MLFLOW_TRACKING_URI(e.g.,mlflow-runs). Ensure it exists in your storage account. - Fix: Create the container in Azure Blob Storage or correct the name in the
MLFLOW_TRACKING_URI. - Why it works: The URI must correctly point to an existing location for MLflow to use it.
- Diagnosis: Verify the container name in the
-
Network Connectivity Issues:
- Diagnosis: Your machine or the environment where MLflow is running might not have access to Azure Blob Storage endpoints. Check firewall rules, VNet configurations, or proxy settings.
- Fix: Ensure outbound connections to
*.blob.core.windows.neton port 443 are allowed. If using a proxy, configure MLflow or your environment to use it. - Why it works: MLflow needs to establish a secure HTTPS connection to Azure Blob Storage.
-
MLflow Version Incompatibility:
- Diagnosis: Older versions of MLflow might have had less robust or different Azure integration.
- Fix: Ensure you are using a recent version of MLflow (
pip install --upgrade mlflow). - Why it works: Newer versions often include bug fixes and improved support for cloud storage backends.
-
Permissions Issues (if using Managed Identity or Service Principal):
- Diagnosis: If you’re not using an access key and instead using Managed Identity or a Service Principal, verify that the identity has "Storage Blob Data Contributor" or equivalent role assigned to the storage account or container.
- Fix: Assign the necessary role to the Managed Identity or Service Principal in Azure IAM.
- Why it works: MLflow needs explicit permissions to perform read/write operations on the Blob Storage container.
-
Tracking URI Format Incorrect (e.g., missing
azure://prefix):- Diagnosis: The
MLFLOW_TRACKING_URImust start withazure://. - Fix: Ensure the URI is in the format
azure://<storage_account_name>.blob.core.windows.net/<container_name>. - Why it works: This prefix tells MLflow to use the Azure Blob Storage backend.
- Diagnosis: The
Verifying the Setup
After configuring, run a simple MLflow experiment:
import mlflow
# Ensure environment variables are set or tracking_uri is passed
# mlflow.set_tracking_uri(...)
with mlflow.start_run(run_name="Azure Blob Test"):
mlflow.log_param("param1", 5)
mlflow.log_metric("metric1", 0.99)
print("Logged to Azure Blob Storage!")
Navigate to your Azure Blob Storage container in the Azure portal. You should see a directory structure corresponding to your experiment runs, parameters, metrics, and artifacts.
The next error you might encounter is related to artifact storage limits or the cost associated with frequent writes to Blob Storage.