azureml-core 1.44.0 fails to deploy model to webservice

Question

azureml-core
1.44.0
conda virtualenv (compute instance):
Azure Machine Learning
ML

Describe the bug

Model fails to deploy when I run the deployment code in Azure Notebook using virtualenv with azureml-core 1.44.0

It works just fine with older version (1.43.0) or the default Python 3.8 - Azure ML that uses 1.42.0 at the moment.

The output:

   Running  
   2022-08-25 07:03:20+00:00 Creating Container Registry if not exists.  
   2022-08-25 07:03:20+00:00 Registering the environment.  
   2022-08-25 07:03:21+00:00 Use the existing image.  
   2022-08-25 07:03:22+00:00 Generating deployment configuration.  
   2022-08-25 07:03:23+00:00 Submitting deployment to compute.  
   2022-08-25 07:03:30+00:00 Checking the status of deployment heart-disease-classification-env..  
   2022-08-25 07:05:44+00:00 Checking the status of inference endpoint heart-disease-classification-env.  
   Failed  
   Service deployment polling reached non-successful terminal state, current service state: Failed  
   Operation ID: 3a980ad2-890e-4e8a-91d6-c119bd0528a4  
   More information can be found using '.get_logs()'  
   Error:  
   {  
     "code": "AciDeploymentFailed",  
     "statusCode": 400,  
     "message": "Aci Deployment failed with exception: Your container application crashed. This may be caused by errors in your scoring file's init() function.  
   	1. Please check the logs for your container instance: heart-disease-classification-env. From the AML SDK, you can run print(service.get_logs()) if you have service object to fetch the logs.  
   	2. You can interactively debug your scoring file locally. Please refer to https://learn.microsoft.com/azure/machine-learning/how-to-debug-visual-studio-code#debug-and-troubleshoot-deployments for more information.  
   	3. You can also try to run image 237a7cc8f2c84e1287a6cc08d5e54f9f.azurecr.io/azureml/azureml_09e362cde9760a4b66987389c8bbc20a locally. Please refer to https://aka.ms/debugimage#service-launch-fails for more information.",  
     "details": [  
       {  
         "code": "CrashLoopBackOff",  
         "message": "Your container application crashed. This may be caused by errors in your scoring file's init() function.  
   	1. Please check the logs for your container instance: heart-disease-classification-env. From the AML SDK, you can run print(service.get_logs()) if you have service object to fetch the logs.  
   	2. You can interactively debug your scoring file locally. Please refer to https://learn.microsoft.com/azure/machine-learning/how-to-debug-visual-studio-code#debug-and-troubleshoot-deployments for more information.  
   	3. You can also try to run image 237a7cc8f2c84e1287a6cc08d5e54f9f.azurecr.io/azureml/azureml_09e362cde9760a4b66987389c8bbc20a locally. Please refer to https://aka.ms/debugimage#service-launch-fails for more information."  
       },  
       {  
         "code": "AciDeploymentFailed",  
         "message": "Your container application crashed. Please follow the steps to debug:  
   	1. From the AML SDK, you can run print(service.get_logs()) if you have service object to fetch the logs. Please refer to https://aka.ms/debugimage#dockerlog for more information.  
   	2. If your container application crashed. This may be caused by errors in your scoring file's init() function. You can try debugging locally first. Please refer to https://aka.ms/debugimage#debug-locally for more information.  
   	3. You can also interactively debug your scoring file locally. Please refer to https://learn.microsoft.com/azure/machine-learning/how-to-debug-visual-studio-code#debug-and-troubleshoot-deployments for more information.  
   	4. View the diagnostic events to check status of container, it may help you to debug the issue.  
   "RestartCount": 3  
   "CurrentState": {"state":"Waiting","startTime":null,"exitCode":null,"finishTime":null,"detailStatus":"CrashLoopBackOff: Back-off restarting failed"}  
   "PreviousState": {"state":"Terminated","startTime":"2022-08-25T07:07:34.619Z","exitCode":111,"finishTime":"2022-08-25T07:07:48.858Z","detailStatus":"Error"}  
   "Events":  
   {"count":1,"firstTimestamp":"2022-08-25T07:03:36Z","lastTimestamp":"2022-08-25T07:03:36Z","name":"Pulling","message":"pulling image "237a7cc8f2c84e1287a6cc08d5e54f9f.azurecr.io/azureml/azureml_09e362cde9760a4b66987389c8bbc20a@sha256:7650a3f19eb4803881637a920dc3e9bf9837c0e9c492b7d22be840d0ba8cb1cf"","type":"Normal"}  
   {"count":1,"firstTimestamp":"2022-08-25T07:05:15Z","lastTimestamp":"2022-08-25T07:05:15Z","name":"Pulled","message":"Successfully pulled image "237a7cc8f2c84e1287a6cc08d5e54f9f.azurecr.io/azureml/azureml_09e362cde9760a4b66987389c8bbc20a@sha256:7650a3f19eb4803881637a920dc3e9bf9837c0e9c492b7d22be840d0ba8cb1cf"","type":"Normal"}  
   {"count":4,"firstTimestamp":"2022-08-25T07:05:37Z","lastTimestamp":"2022-08-25T07:07:34Z","name":"Started","message":"Started container","type":"Normal"}  
   {"count":4,"firstTimestamp":"2022-08-25T07:05:54Z","lastTimestamp":"2022-08-25T07:07:48Z","name":"Killing","message":"Killing container with id 54971cd5cf0e6de46f30bd592bea94752d4ad857fb32f6d85e33b3a8bd4e4c92.","type":"Normal"}  
   "  
       }  
     ]  
   }

To Reproduce

Steps to reproduce the behavior:

I use the standard heart-diseaase dataset, train the model and export it to model/hd_otr.pkl

In assets folder I store the outlierremover.py script that I use to remove outliers:

import pandas as pd  
from sklearn.base import BaseEstimator, TransformerMixin  

class OutlierRemover(BaseEstimator, TransformerMixin):  
    def __init__(self, factor=1.5):  
        self.factor = factor  

    def outlier_detector(self, X, y=None):  
        X = pd.Series(X).copy()  
        q1 = X.quantile(0.25)  
        q3 = X.quantile(0.75)  
        iqr = q3 - q1  
        self.lower_bound.append(q1 - (self.factor * iqr))  
        self.upper_bound.append(q3 + (self.factor * iqr))  

    def fit(self,X,y=None):  
        self.lower_bound = []  
        self.upper_bound = []  
        X.apply(self.outlier_detector)  
        return self  

    def transform(self, X, y=None):  
        X = pd.DataFrame(X).copy()  
        for i in range(X.shape[1]):  
            x = X.iloc[:, i].copy()  
            x[(x < self.lower_bound[i])] = self.lower_bound[i]  
            x[(x > self.upper_bound[i])] = self.upper_bound[i]  
            X.iloc[:, i] = x  
        return X  

outlier_remover = OutlierRemover()

and score.py file:

   import joblib  
   from azureml.core.model import Model  
   import json  
   import pandas as pd  
   import numpy as np  
   from outlierremover import OutlierRemover  
     
   from inference_schema.schema_decorators import input_schema, output_schema  
   from inference_schema.parameter_types.numpy_parameter_type import NumpyParameterType  
   from inference_schema.parameter_types.pandas_parameter_type import PandasParameterType  
   from inference_schema.parameter_types.standard_py_parameter_type import StandardPythonParameterType  
     
   def init():  
       global model  
       # Example when the model is a file  
       model_path = Model.get_model_path('hd_otr') # logistic  
       print('Model Path is  ', model_path)  
       model = joblib.load(model_path)  
         
   data_sample = PandasParameterType(pd.DataFrame({'age': pd.Series([71], dtype='int64'),  
                                                   'sex': pd.Series(['0'], dtype='object'),  
                                                   'cp': pd.Series(['0'], dtype='object'),  
                                                   'trestbps': pd.Series([112], dtype='int64'),  
                                                   'chol': pd.Series([203], dtype='int64'),  
                                                   'fbs': pd.Series(['0'], dtype='object'),  
                                                   'restecg': pd.Series(['1'], dtype='object'),  
                                                   'thalach': pd.Series([185], dtype='int64'),  
                                                   'exang': pd.Series(['0'], dtype='object'),  
                                                   'oldpeak': pd.Series([0.1], dtype='float64'),  
                                                   'slope': pd.Series(['2'], dtype='object'),  
                                                   'ca': pd.Series(['0'], dtype='object'),  
                                                   'thal': pd.Series(['2'], dtype='object')}))  
     
   input_sample = StandardPythonParameterType({'data': data_sample})  
   result_sample = NumpyParameterType(np.array([0]))  
   output_sample = StandardPythonParameterType({'Results': result_sample})  
     
   @input_schema('Inputs', input_sample)  
   @output_schema(output_sample)  
   def run(Inputs):  
       try:  
           data = Inputs['data']  
           #result = model.predict_proba(data)  
           result = np.round(model.predict_proba(data)[0][0], 2)  
           return result.tolist()  
       except Exception as e:  
           error = str(e)  
           return error

In the deployment.ipynb notebook the code is as follows:

from azureml.core import Workspace  
from azureml.core.webservice import AciWebservice  
from azureml.core.webservice import Webservice  
from azureml.core.model import InferenceConfig  
from azureml.core.environment import Environment  
from azureml.core import Workspace  
from azureml.core.model import Model  
from azureml.core.conda_dependencies import CondaDependencies  

ws = Workspace.from_config()  

model = Model.register(workspace = ws,  
              model_path ='model/hd_otr.pkl',  
              model_name = 'hd_otr',  
              tags = {'version': '1'},  
              description = 'Heart disease classification with outliers detection',  
              )  

# to install required packages  
env = Environment('env')  
cd = CondaDependencies.create(pip_packages=['pandas', 'azureml-defaults', 'joblib', 'inference-schema', 'imbalanced-learn'], conda_packages = ['scikit-learn'])  
env.python.conda_dependencies = cd  

# register environment to re-use later  
env.register(workspace = ws)  

myenv = Environment.get(workspace=ws, name='env')  

myenv.save_to_directory('./environ', overwrite=True)  

aciconfig = AciWebservice.deploy_configuration(  
            cpu_cores=1,  
            memory_gb=1,  
            tags={'data':'heart disease classifier'},  
            description='Classification of heart diseases'  
            )  

inference_config = InferenceConfig(entry_script='score.py', environment=myenv, source_directory='./assets')  

service = Model.deploy(workspace=ws,  
                name='heart-disease-classification-env',  
                models=[model],  
                inference_config=inference_config,  
                deployment_config=aciconfig,   
                overwrite=True)  

service.wait_for_deployment(show_output=True)  
url = service.scoring_uri  
print(url)

...which gives the error from 1. with 1.44.0 but works just fine with the older versions.

Share via

azureml-core 1.44.0 fails to deploy model to webservice