MLflow logging is surprisingly less about recording and more about structuring your ML experiments for reproducibility and comparison.

Let’s see it in action. Imagine a simple Python script training a scikit-learn model:

import mlflow
import mlflow.sklearn
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Load data
iris = load_iris()
X, y = iris.data, iris.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Define hyperparameters
C_val = 0.1
solver_val = 'liblinear'

# Start an MLflow run
with mlflow.start_run(run_name="Iris Logistic Regression"):
    # Log parameters
    mlflow.log_param("C", C_val)
    mlflow.log_param("solver", solver_val)

    # Initialize and train the model
    model = LogisticRegression(C=C_val, solver=solver_val, random_state=42)
    model.fit(X_train, y_train)

    # Make predictions
    y_pred = model.predict(X_test)

    # Calculate metric
    accuracy = accuracy_score(y_test, y_pred)

    # Log metric
    mlflow.log_metric("accuracy", accuracy)

    # Log the model as an artifact
    mlflow.sklearn.log_model(model, "iris_model")

    print(f"MLflow Run ID: {mlflow.active_run().info.run_id}")
    print(f"Logged Parameter C: {mlflow.get_run().data.params['C']}")
    print(f"Logged Metric Accuracy: {mlflow.get_run().data.metrics['accuracy']}")

When you run this script, MLflow creates a local mlruns directory (or uses a configured tracking server). Inside, it organizes your experiments. Each mlflow.start_run() creates a distinct "run," which is a unique instance of your experiment execution.

Within each run, you have three primary types of logged information:

  • Parameters: These are the inputs to your experiment. In the example, C_val and solver_val are logged as "C": 0.1 and "solver": "liblinear". MLflow stores these as key-value pairs. This is crucial because it captures exactly what settings were used for a particular model training. You can later search and filter runs based on parameter values.

  • Metrics: These are the outputs or performance indicators of your experiment. The calculated accuracy of 0.9777... is logged. Metrics can be logged multiple times within a single run, allowing you to track how a metric changes over time (e.g., during training epochs). MLflow stores these as key-value pairs with an associated timestamp.

  • Artifacts: These are the files produced by your experiment. This is the most flexible category and can include trained models (like the scikit-learn model logged using mlflow.sklearn.log_model), data files, plots (e.g., matplotlib figures), configuration files, or even entire directories. MLflow stores these as files within the run’s directory structure, making them accessible for later download or use.

The power comes from MLflow’s UI, accessible by running mlflow ui in your terminal from the directory containing mlruns. This web interface lets you visualize all your runs, compare them side-by-side, filter by parameters and metrics, and download artifacts. This structured logging transforms a chaotic collection of scripts and output files into a searchable, reproducible history of your machine learning journey.

The real magic of mlflow.log_model is its ability to serialize not just the model object itself, but also its dependencies and even the code that created it, allowing you to load and use the model later in a different environment with mlflow.<flavor>.load_model().

The next step is understanding how to leverage the MLflow tracking server for collaborative and centralized experiment management.

Want structured learning?

Take the full Mlflow course →