MLflow on Databricks is more than just a place to log your experiments; it’s a tightly integrated system designed to streamline the entire machine learning lifecycle, from initial development to production deployment.

Here’s a peek at it in action. Imagine you’re training a model on a Databricks cluster. You’d typically have code like this:

import mlflow
import mlflow.sklearn
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split

# Load data
iris = load_iris()
X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, test_size=0.3, random_state=42)

# Start an MLflow run
with mlflow.start_run(run_name="Iris RF Example"):
    # Log parameters
    n_estimators = 100
    max_depth = 10
    mlflow.log_param("n_estimators", n_estimators)
    mlflow.log_param("max_depth", max_depth)

    # Train the model
    rf = RandomForestClassifier(n_estimators=n_estimators, max_depth=max_depth, random_state=42)
    rf.fit(X_train, y_train)

    # Log metrics
    accuracy = rf.score(X_test, y_test)
    mlflow.log_metric("accuracy", accuracy)

    # Log the model
    mlflow.sklearn.log_model(rf, "random-forest-model")

print(f"MLflow Run completed. Artifacts logged to: {mlflow.get_artifact_uri()}")

When you run this on Databricks, the mlflow.start_run() automatically connects to the managed MLflow Tracking Server associated with your workspace. This means you don’t need to configure a separate server; Databricks handles it. The run_name, parameters (n_estimators, max_depth), metrics (accuracy), and the trained scikit-learn model itself are all sent to this server and are immediately visible in the MLflow UI within your Databricks workspace.

The core problem MLflow solves is the chaos of experimentation. Without it, tracking which code version produced which model, what hyperparameters were used, and how they performed against each other is a manual, error-prone process. MLflow provides a centralized, auditable record.

Internally, the Databricks managed MLflow service consists of two main components: the Tracking Server and the Model Registry.

The Tracking Server is where all your mlflow.log_param(), mlflow.log_metric(), and mlflow.log_artifact() calls go. It stores this information in a robust backend, typically an object store (like S3 or ADLS Gen2) for artifacts and a managed database for metadata. When you use Databricks, this backend is provisioned and managed for you.

The Model Registry builds on top of the Tracking Server. It’s a centralized place to manage the lifecycle of your ML models. Once a model is logged, you can transition it through various stages: Staging, Production, and Archived. This allows teams to collaborate on models, promote them through testing, and deploy them with confidence, all while maintaining version history and lineage. For example, you might have a model logged from a specific run, and then register that run’s model artifact to the Model Registry as version "1" of "MyIrisClassifier." You can then transition this "1" to Staging for QA, and later to Production.

The most surprising thing most people don’t realize is how deeply integrated MLflow’s lineage tracking is within Databricks. When you register a model artifact from a specific MLflow run, the Model Registry doesn’t just store the model file; it stores a direct pointer back to the exact run that produced it, including all its parameters, metrics, and even the Git commit hash if you’re using Databricks Repos. This means if your production model ever fails, you can instantly click through to the original experiment run that created it, see exactly what went into it, and reproduce it if necessary. This end-to-end traceability is incredibly powerful for debugging and auditing.

Once you’ve moved a model to Production in the Model Registry, the next natural step is to serve it for real-time inference, often using Databricks Model Serving.

Want structured learning?

Take the full Mlflow course →