Mlflow and Spark

Bakhruz Dzhafarov 60 Reputation points

Hi, I encountered the following problem when I tried to use a model for spark inference (via mlflow.pyfunc.spark_udf) that I had previously trained in pandas and saved in mlflow.

I saved a model via

from mlflow.tracking import MlflowClient
from azureml.core import Workspace

# Connect to your Azure ML workspace
ws = Workspace.from_config()  # Make sure you have a config.json file

# Set the tracking URI to Azure ML workspace

client = MlflowClient()

# Start an MLflow run
experiment_name = "CatBoost_Experiment"
experiment_id = client.create_experiment(experiment_name)
run = client.create_run(experiment_id)

# Log the CatBoost model using the client
mlflow.catboost.log_model(model, "catboost_model_20", registered_model_name= "catboost_model") 

and read

model_uri = "models:/catboost_model/latest"  # Use the latest version of the registered model

# Load the model as a PySpark UDF
loaded_model_udf = mlflow.pyfunc.spark_udf(spark, model_uri, env_manager="conda")

logs in the attachment stderr.txt

  File "/home/trusted-service-user/cluster-env/env/lib/python3.10/site-packages/mlflow/pyfunc/", line 1069, in udf
  File "/home/trusted-service-user/cluster-env/env/lib/python3.10/site-packages/mlflow/pyfunc/", line 89, in prepare_env
    conda_env_path = os.path.join(local_path, self._config[ENV])
  File "/home/trusted-service-user/cluster-env/env/lib/python3.10/", line 90, in join
    genericpath._check_arg_types('join', a, *p)
  File "/home/trusted-service-user/cluster-env/env/lib/python3.10/", line 152, in _check_arg_types
    raise TypeError(f'{funcname}() argument must be str, bytes, or '
TypeError: join() argument must be str, bytes, or os.PathLike object, not 'dict'

I was able to find a relatively similar error

Azure Machine Learning
Azure Machine Learning
An Azure machine learning service for building and deploying models.
2,720 questions
{count} votes