MLflow Model Deployment Error - Can't Find MLmodel File in Registered Model

Sebastian Buzdugan 40 Reputation points
2024-12-17T10:38:21.2566667+00:00

Hi,

I'm trying to deploy a registered MLflow model in Azure ML. I've already fixed the environment issues by creating a custom environment with the correct dependencies for GPU support and tensor parallelism. However, now I'm getting an error in the scoring script (score.py) when trying to load the model.

(I have started from this issue: https://learn.microsoft.com/en-us/answers/questions/2131992/deployment-on-azure-ml-with-tensor-parallelism-fai)

Error from logs:


ERROR:root:Error loading model: Could not find an "MLmodel" configuration file at "/var/azureml-app/azureml-models/FinancialChatbot/1/MLmodel"

My model directory structure (from deployment logs):


    mlflow_model_folder/
        MLmodel
        code/
        data/
        model/
            model-00001-of-00017.safetensors
            [... other model files ...]

How should I structure my score.py to properly load this registered model? The model is registered in Azure ML Studio UI, but I'm not sure how to reference it correctly in the scoring script.

Current score.py approach:


def init():
    global model
    model_dir = os.getenv('AZUREML_MODEL_DIR')
    model = mlflow.pyfunc.load_model(model_dir)

Any guidance on how to properly load a registered MLflow model in the scoring script would be greatly appreciated.

my full score.py file if needed:

import os
import logging
import json
import mlflow
import torch

def init():
    """
    Initialize the model with tensor parallelism configuration
    """
    global model
    

    logging.info("Starting model initialization...")
    
    # fixing tensor parallelism
    os.environ["TENSOR_PARALLEL_SIZE"] = "3"  
    

    model_dir = os.getenv("AZUREML_MODEL_DIR")
    mlflow.set_tracking_uri(f"file:{model_dir}")
    

    try:
        model = mlflow.pyfunc.load_model(model_dir)
        logging.info("Model loaded successfully")
    except Exception as e:
        logging.error(f"Error loading model: {str(e)}")
        raise
    
    #configure DeepSpeed settings
    ds_config = {
        "tensor_parallel": {
            "size": 3,
            "tp_overlap": True
        },
        "zero_optimization": {
            "stage": 3
        }
    }
    os.environ["DEEPSPEED_CONFIG"] = json.dumps(ds_config)
    
    logging.info("Initialization complete")

def run(raw_data):
    """
    Run inference on the input data
    """
    try:
     
        logging.info("Processing request...")
        input_data = json.loads(raw_data)
        

        result = model.predict(input_data)
        
    
        response = {
            "result": result,
            "status": "success"
        }
        
        logging.info("Request processed successfully")
        return json.dumps(response)
        
    except Exception as e:
        error_response = {
            "error": str(e),
            "status": "error"
        }
        logging.error(f"Error during inference: {str(e)}")
        return json.dumps(error_response)


Thanks!

Azure Machine Learning
Azure Machine Learning
An Azure machine learning service for building and deploying models.
3,071 questions
{count} votes

1 answer

Sort by: Most helpful
  1. Sebastian Buzdugan 40 Reputation points
    2024-12-20T16:15:01.4+00:00

    Yes, it did not work. The only solution was to do use the same expensive compute I used to train this model.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.