MLflow + Optuna: Hyperparameter Tuning with Tracking (2026)

MLflow and Optuna are two powerful tools that, when combined, offer a robust solution for hyperparameter tuning and experiment tracking in machine learning projects. Optuna excels at efficient hyperparameter search, while MLflow provides comprehensive experiment logging, visualization, and reproducibility.

Here’s a glimpse of them in action. Imagine you’re tuning a Random Forest classifier’s n_estimators and max_depth:

import optuna
import mlflow
import mlflow.sklearn
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.datasets import make_classification
from sklearn.metrics import accuracy_score

# 1. Set up MLflow Tracking
# This will create a 'mlruns' directory if it doesn't exist.
# You can also set MLFLOW_TRACKING_URI environment variable to point to a remote server.
mlflow.set_experiment("Optuna + MLflow Example")

# 2. Define the Objective Function for Optuna
def objective(trial: optuna.Trial):
    # Define hyperparameters to tune
    n_estimators = trial.suggest_int("n_estimators", 50, 500)
    max_depth = trial.suggest_int("max_depth", 2, 32)
    min_samples_split = trial.suggest_int("min_samples_split", 2, 20)
    min_samples_leaf = trial.suggest_int("min_samples_leaf", 1, 20)
    bootstrap = trial.suggest_categorical("bootstrap", [True, False])

    # Instantiate the model
    model = RandomForestClassifier(
        n_estimators=n_estimators,
        max_depth=max_depth,
        min_samples_split=min_samples_split,
        min_samples_leaf=min_samples_leaf,
        bootstrap=bootstrap,
        random_state=42 # for reproducibility of the model itself
    )

    # Generate some sample data
    X, y = make_classification(n_samples=1000, n_features=20, n_informative=10, random_state=42)
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

    # Train the model
    model.fit(X_train, y_train)

    # Make predictions
    y_pred = model.predict(X_test)
    accuracy = accuracy_score(y_test, y_pred)

    # 3. Log metrics and parameters with MLflow within the objective function
    # This is the key integration point.
    mlflow.log_param("min_samples_split", min_samples_split) # Log parameters that are not directly part of trial.suggest
    mlflow.log_param("min_samples_leaf", min_samples_leaf)
    mlflow.log_param("bootstrap", bootstrap)
    mlflow.log_metric("accuracy", accuracy)

    # Optionally, log the model itself
    mlflow.sklearn.log_model(model, "random_forest_model")

    return accuracy

# 4. Create an Optuna study and run the optimization
study = optuna.create_study(direction="maximize")
study.optimize(objective, n_trials=100) # Run 100 trials

print("Number of finished trials: ", len(study.trials))
print("Best trial:")
trial = study.best_trial

print("  Value: ", trial.value)
print("  Params: ")
for key, value in trial.params.items():
    print(f"    {key}: {value}")

# To view the MLflow UI, run 'mlflow ui' in your terminal in the directory where 'mlruns' is created.

This code snippet demonstrates how to define an Optuna objective function that trains a scikit-learn model. Crucially, within this objective, mlflow.log_param and mlflow.log_metric are called to record the hyperparameters and the resulting accuracy for each trial. mlflow.sklearn.log_model saves the trained model artifact. When you run mlflow ui in your terminal, you’ll see a rich dashboard showing each trial as a distinct run, with its parameters, metrics, and even the saved model readily accessible.

The core problem MLflow and Optuna solve together is bridging the gap between efficient hyperparameter exploration and systematic experiment management. Optuna’s sophisticated pruning and sampling strategies (like TPE, CMA-ES) find optimal hyperparameters much faster than brute-force methods. MLflow, on the other hand, ensures that every single one of these optimized trials isn’t lost. It creates a reproducible record: the exact code version, the specific hyperparameters, the training data snapshot (if configured), the evaluation metrics, and the trained model artifact. This allows you to revisit any past experiment, understand why a particular set of hyperparameters worked well, and easily reproduce that result.

Internally, Optuna manages the search space and suggests hyperparameter combinations. For each suggestion, it calls your objective function. Inside objective, you perform your ML task and then, using the mlflow library, you record the outcome. MLflow acts as a backend, storing this information. When you call mlflow.log_param("param_name", value), MLflow writes this key-value pair to the current active MLflow run. Similarly, mlflow.log_metric("metric_name", value) records a numerical metric, allowing for time-series plotting if the metric is logged multiple times within a single run (e.g., epoch-wise loss). mlflow.sklearn.log_model serializes your scikit-learn model using pickle and saves it as an artifact associated with that specific MLflow run.

The most surprising thing is how seamlessly MLflow integrates within the Optuna objective function itself, rather than requiring a separate wrapper. You don’t need to tell Optuna about MLflow; you just tell your training code to log to MLflow. Optuna focuses on the search, and MLflow captures everything that happens during that search. This means you can leverage advanced Optuna features like early stopping (pruning) and still have each pruned trial logged by MLflow, showing that it was stopped early due to poor performance based on intermediate metrics, which is invaluable for understanding search space efficiency.

The next logical step after mastering hyperparameter tuning with tracking is exploring distributed hyperparameter optimization with MLflow and Optuna, where multiple machines or processes collaborate on the search.