MLflow + XGBoost and LightGBM: Track Tree Models (2026)

Tracking tree models in MLflow is surprisingly easy, but the real magic happens when you realize you can reconstruct the exact training environment and parameters that produced those trees, not just the model file itself.

Let’s watch MLflow in action. Imagine you’re training an XGBoost model.

import xgboost as xgb
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
import mlflow

# Load data
iris = load_iris()
X, y = iris.data, iris.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Start MLflow run
mlflow.set_experiment("XGBoost Tree Tracking")
with mlflow.start_run():
    # Define XGBoost parameters
    params = {
        "objective": "multi:softmax",
        "num_class": 3,
        "eval_metric": "mlogloss",
        "eta": 0.1,
        "max_depth": 3,
        "seed": 42
    }

    # Log parameters
    mlflow.log_params(params)

    # Train model
    dtrain = xgb.DMatrix(X_train, label=y_train)
    model = xgb.train(params, dtrain, num_boost_round=100)

    # Log the model
    mlflow.xgboost.log_model(model, "model")

    # Evaluate (optional, but good practice)
    dtest = xgb.DMatrix(X_test, label=y_test)
    predictions = model.predict(dtest)
    # In a real scenario, you'd calculate metrics here and log them
    # mlflow.log_metric("accuracy", accuracy_score(y_test, predictions))

print(f"MLflow run completed. Model logged under run ID: {mlflow.active_run().info.run_id}")

This code does a few key things. It sets up an MLflow experiment, starts a run, defines XGBoost parameters, logs those parameters using mlflow.log_params(), trains the model, and then crucially, logs the trained XGBoost model using mlflow.xgboost.log_model(). MLflow handles packaging the model artifact along with its dependencies.

Now, let’s see how you’d load and use that logged model.

import mlflow
import xgboost as xgb

# Replace with your actual run ID
run_id = "REPLACE_WITH_YOUR_RUN_ID"
model_uri = f"runs:/{run_id}/model"

# Load the model
loaded_model = mlflow.xgboost.load_model(model_uri)

# Make predictions
# Assuming you have X_test from the previous script
# For demonstration, let's create dummy data
X_test_dummy = [[5.1, 3.5, 1.4, 0.2], [4.9, 3.0, 1.4, 0.2]] # Sample from Iris dataset
dtest_dummy = xgb.DMatrix(X_test_dummy)
predictions = loaded_model.predict(dtest_dummy)

print("Predictions:", predictions)

When you run mlflow.xgboost.load_model(model_uri), MLflow doesn’t just give you a serialized model file. It reconstructs the model object based on the saved artifacts and the environment information captured during the logging process. This means you get a fully functional XGBoost model ready for inference.

The system solves the problem of reproducibility for complex models. Instead of just saving a .pkl or .ubj file, MLflow captures the model artifact and its associated metadata: the exact parameters used for training, the version of the XGBoost library, and potentially other environment details. This allows you to reliably reproduce your model’s training and behavior later, even if library versions or system configurations have changed.

For LightGBM, the process is very similar, leveraging mlflow.lightgbm.log_model().

import lightgbm as lgb
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
import mlflow

# Load data (same as before)
iris = load_iris()
X, y = iris.data, iris.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Start MLflow run
mlflow.set_experiment("LightGBM Tree Tracking")
with mlflow.start_run():
    # Define LightGBM parameters
    params = {
        "objective": "multiclass",
        "num_class": 3,
        "metric": "multi_logloss",
        "learning_rate": 0.1,
        "max_depth": 3,
        "seed": 42
    }

    # Log parameters
    mlflow.log_params(params)

    # Train model
    train_data = lgb.Dataset(X_train, label=y_train)
    model = lgb.train(params, train_data, num_boost_round=100)

    # Log the model
    mlflow.lightgbm.log_model(model, "model")

print(f"MLflow run completed. Model logged under run ID: {mlflow.active_run().info.run_id}")

And loading it:

import mlflow
import lightgbm as lgb

run_id = "REPLACE_WITH_YOUR_RUN_ID"
model_uri = f"runs:/{run_id}/model"

loaded_model = mlflow.lightgbm.load_model(model_uri)

# Make predictions
X_test_dummy = [[5.1, 3.5, 1.4, 0.2], [4.9, 3.0, 1.4, 0.2]]
predictions = loaded_model.predict(X_test_dummy)

print("Predictions:", predictions)

The core mental model is that mlflow.<framework>.log_model acts as a sophisticated serializer. It not only saves the model object itself but also bundles it with a MLmodel configuration file and potentially other artifacts (like the requirements.txt of the environment it was trained in). When mlflow.<framework>.load_model is called, MLflow reads this MLmodel file, understands what kind of model it is, and uses the stored information to reconstruct the model object, often leveraging the specific library versions recorded.

A common pitfall is forgetting to log the parameters separately. While mlflow.<framework>.log_model does store some model-specific configuration, explicitly logging your training hyperparameters with mlflow.log_params() provides a clear, human-readable record of all the settings you used, making it much easier to compare different training runs and understand why one model performed better than another. These logged parameters are directly visible in the MLflow UI alongside the model artifact.

The next step you’ll likely encounter is managing different model versions within the same MLflow experiment, allowing you to track improvements and roll back if necessary.