Mlflow and Spark

Question

Hi, I encountered the following problem when I tried to use a model for spark inference (via mlflow.pyfunc.spark_udf) that I had previously trained in pandas and saved in mlflow.

I saved a model via

from mlflow.tracking import MlflowClient
from azureml.core import Workspace

# Connect to your Azure ML workspace
ws = Workspace.from_config()  # Make sure you have a config.json file

# Set the tracking URI to Azure ML workspace
mlflow.set_tracking_uri(ws.get_mlflow_tracking_uri())

client = MlflowClient()

# Start an MLflow run
experiment_name = "CatBoost_Experiment"
experiment_id = client.create_experiment(experiment_name)
run = client.create_run(experiment_id)

# Log the CatBoost model using the client
mlflow.catboost.log_model(model, "catboost_model_20", registered_model_name= "catboost_model")

and read

model_uri = "models:/catboost_model/latest"  # Use the latest version of the registered model

# Load the model as a PySpark UDF
mlflow.pyfunc.get_model_dependencies(model_uri)
loaded_model_udf = mlflow.pyfunc.spark_udf(spark, model_uri, env_manager="conda")

logs in the attachment stderr.txt

  File "/home/trusted-service-user/cluster-env/env/lib/python3.10/site-packages/mlflow/pyfunc/__init__.py", line 1069, in udf
    pyfunc_backend.prepare_env(
  File "/home/trusted-service-user/cluster-env/env/lib/python3.10/site-packages/mlflow/pyfunc/backend.py", line 89, in prepare_env
    conda_env_path = os.path.join(local_path, self._config[ENV])
  File "/home/trusted-service-user/cluster-env/env/lib/python3.10/posixpath.py", line 90, in join
    genericpath._check_arg_types('join', a, *p)
  File "/home/trusted-service-user/cluster-env/env/lib/python3.10/genericpath.py", line 152, in _check_arg_types
    raise TypeError(f'{funcname}() argument must be str, bytes, or '
TypeError: join() argument must be str, bytes, or os.PathLike object, not 'dict'

I was able to find a relatively similar error

https://community.databricks.com/t5/machine-learning/logging-model-to-mlflow-using-feature-store-api-getting/td-p/7890

Share via

Mlflow and Spark

Your answer