Yes, it did not work. The only solution was to do use the same expensive compute I used to train this model.
MLflow Model Deployment Error - Can't Find MLmodel File in Registered Model
Hi,
I'm trying to deploy a registered MLflow model in Azure ML. I've already fixed the environment issues by creating a custom environment with the correct dependencies for GPU support and tensor parallelism. However, now I'm getting an error in the scoring script (score.py) when trying to load the model.
(I have started from this issue: https://learn.microsoft.com/en-us/answers/questions/2131992/deployment-on-azure-ml-with-tensor-parallelism-fai)
Error from logs:
ERROR:root:Error loading model: Could not find an "MLmodel" configuration file at "/var/azureml-app/azureml-models/FinancialChatbot/1/MLmodel"
My model directory structure (from deployment logs):
mlflow_model_folder/
MLmodel
code/
data/
model/
model-00001-of-00017.safetensors
[... other model files ...]
How should I structure my score.py to properly load this registered model? The model is registered in Azure ML Studio UI, but I'm not sure how to reference it correctly in the scoring script.
Current score.py approach:
def init():
global model
model_dir = os.getenv('AZUREML_MODEL_DIR')
model = mlflow.pyfunc.load_model(model_dir)
Any guidance on how to properly load a registered MLflow model in the scoring script would be greatly appreciated.
my full score.py file if needed:
import os
import logging
import json
import mlflow
import torch
def init():
"""
Initialize the model with tensor parallelism configuration
"""
global model
logging.info("Starting model initialization...")
# fixing tensor parallelism
os.environ["TENSOR_PARALLEL_SIZE"] = "3"
model_dir = os.getenv("AZUREML_MODEL_DIR")
mlflow.set_tracking_uri(f"file:{model_dir}")
try:
model = mlflow.pyfunc.load_model(model_dir)
logging.info("Model loaded successfully")
except Exception as e:
logging.error(f"Error loading model: {str(e)}")
raise
#configure DeepSpeed settings
ds_config = {
"tensor_parallel": {
"size": 3,
"tp_overlap": True
},
"zero_optimization": {
"stage": 3
}
}
os.environ["DEEPSPEED_CONFIG"] = json.dumps(ds_config)
logging.info("Initialization complete")
def run(raw_data):
"""
Run inference on the input data
"""
try:
logging.info("Processing request...")
input_data = json.loads(raw_data)
result = model.predict(input_data)
response = {
"result": result,
"status": "success"
}
logging.info("Request processed successfully")
return json.dumps(response)
except Exception as e:
error_response = {
"error": str(e),
"status": "error"
}
logging.error(f"Error during inference: {str(e)}")
return json.dumps(error_response)
Thanks!