MLflow’s batch inference capabilities let you score large datasets using Spark or Pandas, but the real magic is how it decouples model packaging from execution, making reproducibility a first-class citizen.
Let’s see this in action. Imagine you’ve trained a scikit-learn model and logged it with MLflow.
import mlflow
import mlflow.sklearn
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import make_classification
# Generate some dummy data
X, y = make_classification(n_samples=100, n_features=2, random_state=42)
# Train a simple model
model = LogisticRegression()
model.fit(X, y)
# Log the model with MLflow
with mlflow.start_run():
mlflow.sklearn.log_model(model, "model")
run_id = mlflow.active_run().info.run_id
print(f"Model logged with run_id: {run_id}")
Now, instead of loading this model directly in your scoring script, you can leverage MLflow’s pyfunc flavor to create a generic Python function wrapper. This wrapper handles loading the model from its logged artifact location and provides a consistent predict interface, regardless of the original model type (scikit-learn, TensorFlow, PyTorch, etc.).
For batch inference, MLflow integrates seamlessly with Spark. You can load your MLflow-logged model as a Spark UDF (User Defined Function) and apply it to an entire DataFrame.
from pyspark.sql import SparkSession
from pyspark.sql.functions import udf
from pyspark.sql.types import DoubleType
# Initialize Spark Session
spark = SparkSession.builder.appName("MLflowBatchInference").getStyle()
# Load the MLflow model as a Spark UDF
# Replace 'YOUR_MLFLOW_TRACKING_URI' and 'YOUR_RUN_ID' with actual values
model_uri = f"runs:/{run_id}/model"
loaded_model = mlflow.pyfunc.spark_udf(spark, model_uri)
# Create a dummy Spark DataFrame for inference
data = [(1.0, 2.0), (3.0, 4.0), (5.0, 6.0)]
columns = ["feature1", "feature2"]
df = spark.createDataFrame(data, columns)
# Apply the model UDF to the DataFrame
# The UDF expects a row, so we'll pass the entire row to the predict function
df_with_predictions = df.withColumn("prediction", loaded_model(*[df[c] for c in df.columns]))
# Show the results
df_with_predictions.show()
This approach offers several advantages. First, it abstracts away the model loading and prediction logic. Your Spark job only needs to know the model_uri and how to pass data to the UDF. Second, it ensures that the exact model artifact used during training is loaded for inference, preventing discrepancies due to different library versions or environment setups. MLflow’s model registry further enhances this by allowing you to version models and promote them through stages (e.g., Staging, Production), making deployment and rollback straightforward.
The internal pyfunc wrapper serializes/deserializes the model and handles the underlying prediction calls. When used with Spark, MLflow’s spark_udf function efficiently distributes the model loading and prediction across your Spark cluster. Each Spark executor loads a copy of the model and applies it to its assigned partitions of the DataFrame.
The most surprising thing about MLflow’s batch inference is that even though you specify a model_uri pointing to a specific run, MLflow’s pyfunc interface can load models trained with vastly different libraries (like XGBoost or TensorFlow) into a single, unified prediction interface. This is because the pyfunc flavor acts as a universal adapter, abstracting away the specific model’s prediction mechanics behind a common predict method.
Beyond Spark, you can also perform batch inference using Pandas DataFrames with MLflow. This is useful for smaller datasets or when you don’t have a Spark cluster available.
import pandas as pd
# Load the MLflow model as a Pandas UDF
model_uri = f"runs:/{run_id}/model"
loaded_model_pd = mlflow.pyfunc.load_model(model_uri)
# Create a dummy Pandas DataFrame
data_pd = {'feature1': [1.0, 3.0, 5.0], 'feature2': [2.0, 4.0, 6.0]}
pdf = pd.DataFrame(data_pd)
# Make predictions
predictions_pd = loaded_model_pd.predict(pdf)
print(predictions_pd)
This flexibility allows you to choose the right tool for the job, whether it’s distributed processing with Spark or in-memory computation with Pandas, all while maintaining a consistent model artifact and inference API.
The next step is to explore how to integrate this batch inference pipeline into a production workflow using MLflow Projects and deployment tools.