MLflow’s magic is that it lets you track everything about your machine learning experiments, not just the final model. When you’re using scikit-learn, this means logging not just the accuracy of your best model, but the entire workflow that produced it.
Let’s see what this looks like in practice. Imagine you’re building a simple linear regression model.
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import mean_squared_error
import mlflow
import mlflow.sklearn
# Sample Data
data = {'feature1': [i for i in range(100)],
'feature2': [i*2 for i in range(100)],
'target': [i*3 + 5 for i in range(100)]}
df = pd.DataFrame(data)
X = df[['feature1', 'feature2']]
y = df['target']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Define the MLflow experiment
mlflow.set_experiment("Scikit-learn Pipeline Logging Example")
# Start an MLflow run
with mlflow.start_run():
# Define the scikit-learn pipeline
pipeline = Pipeline([
('scaler', StandardScaler()),
('regressor', LinearRegression())
])
# Train the pipeline
pipeline.fit(X_train, y_train)
# Make predictions
y_pred = pipeline.predict(X_test)
# Evaluate the model
mse = mean_squared_error(y_test, y_pred)
rmse = mse**0.5
# Log parameters
mlflow.log_param("test_size", 0.2)
mlflow.log_param("random_state", 42)
mlflow.log_param("model_type", "LinearRegression")
mlflow.log_param("scaler_type", "StandardScaler")
# Log metrics
mlflow.log_metric("mse", mse)
mlflow.log_metric("rmse", rmse)
# Log the scikit-learn pipeline
mlflow.sklearn.log_model(pipeline, "model")
print(f"MLflow Run ID: {mlflow.active_run().info.run_id}")
print(f"MSE: {mse}")
print(f"RMSE: {rmse}")
This code sets up a basic scikit-learn pipeline, trains it, makes predictions, and calculates metrics. The key part is mlflow.sklearn.log_model(pipeline, "model"). This command doesn’t just save the trained LinearRegression model; it serializes the entire Pipeline object. This means when you load this artifact later, you get back a fully functional pipeline, including the StandardScaler and the LinearRegression step, ready to be used for new predictions.
The problem this solves is the "it works on my machine" phenomenon, but for ML. Without logging the pipeline, you might save just the trained LinearRegression model. Then, when you want to make predictions on new data, you’d have to remember to apply the StandardScaler before feeding it to the model, and you’d have to apply it with the same parameters it was trained with. MLflow’s pipeline logging captures that entire sequence. You can reconstruct the exact preprocessing and modeling steps that led to a particular result.
Internally, mlflow.sklearn.log_model uses joblib or pickle to serialize the scikit-learn pipeline object. It then uploads this serialized file (often named model.pkl or model.joblib within the logged artifact directory) to your MLflow tracking server, along with any other logged parameters and metrics. When you use mlflow.<flavor>.load_model(), it downloads this file and deserializes it, reconstructing the Python object. The flavor here is important; mlflow.sklearn knows how to handle scikit-learn objects specifically.
When you run this script, you’ll see output including an MLflow Run ID. If you have an MLflow tracking server running (e.g., by running mlflow ui in your terminal in a directory where you want to store runs), you can navigate to that server in your browser. You’ll see a new experiment named "Scikit-learn Pipeline Logging Example" and within it, a run. Clicking on that run will show you the logged parameters (test_size, random_state, etc.), the logged metrics (mse, rmse), and an "Artifacts" section. Under artifacts, you’ll find a directory named "model" containing MLmodel and model.pkl (or model.joblib). The MLmodel file is a metadata file that MLflow uses to understand how to load the artifact, specifying the sklearn flavor.
The most impactful aspect of logging the full pipeline is that it implicitly logs the versions of the libraries used to create that pipeline, provided you’ve configured MLflow correctly. When MLflow logs a scikit-learn model, it captures the environment in which it was created. If you later try to load that model in an environment with incompatible library versions (e.g., an older or newer scikit-learn), MLflow will often warn you or even prevent the load, guiding you toward recreating the correct environment. This is crucial for reproducibility because scikit-learn’s behavior can change subtly between versions.
Once you’ve logged your pipeline, the next logical step is to experiment with different preprocessing steps, like PolynomialFeatures or KBinsDiscretizer, and log those variations to compare their impact on your metrics.