Share via

Azure Machine Learning profiling model errors

JIren 1 Reputation point
2022-08-17T14:57:52.003+00:00

After successfully completing the image-classification-mnist-data tutorial in Azure Machine Learning Samples

Samples/1.43.0/tutorials/image-classification-mnist-data/img-classification-part1-training.ipynb

I would like to profile the resulting model as shown in this article https://learn.microsoft.com/en-us/azure/machine-learning/v1/how-to-deploy-profile-model?pivots=py-sdk

However I keep getting an error saying Running..................................... Failed /tmp/ipykernel_56534/2365332213.py:15: UserWarning: Model Profiling operation failed with the following error: Model service has failed with status: CrashLoopBackOff: Back-off restarting failed. This may be caused by errors in your scoring file's init() function. Error logs URL: Log upload failed. Request ID: b5384f0f-8a3a-4f53-908e-0a028374b924. Inspect ModelProfile.error property for more information. profile.wait_for_completion(True) {'name': 'sklearn-08172022-143854', 'createdTime': '2022-08-17T14:38:56.706085+00:00', 'state': 'Failed', 'requestedCpu': 3.5, 'requestedMemoryInGB': 15.0, 'requestedQueriesPerSecond': 0, 'error': {'code': 'ModelTestBackendCrashLoopBackoff', 'statusCode': 400, 'message': "Model service has failed with status: CrashLoopBackOff: Back-off restarting failed. This may be caused by errors in your scoring file's init() function. Error logs URL: Log upload failed.", 'details': []}}

I only have 1 model in my workspace model list. So why am I getting an error and how can I see the error that is thrown inside the scoring file?

scoring.py

  %%writefile score.py  
    import json  
    import numpy as np  
    import os  
    import pickle  
    import joblib  
      
    def init():  
        global model  
        # AZUREML_MODEL_DIR is an environment variable created during deployment.  
        # It is the path to the model folder (./azureml-models/$MODEL_NAME/$VERSION)  
        # For multiple models, it points to the folder containing all deployed models (./azureml-models)  
        model_path = os.path.join(os.getenv('AZUREML_MODEL_DIR'), 'sklearn_mnist_model.pkl')  
        model = joblib.load(model_path)  
      
    def run(raw_data):  
        data = np.array(json.loads(raw_data)['data'])  
        # make prediction  
        y_hat = model.predict(data)  
        # you can return any data type as long as it is JSON-serializable  
        return y_hat.tolist()  
  

profiling.py

    import os  
    from azureml.core import Dataset  
    from azureml.opendatasets import MNIST  
    from utils import load_data  
    import os  
    import glob  
      
      
    data_folder = os.path.join(os.getcwd(), 'data')  
    os.makedirs(data_folder, exist_ok=True)  
      
    mnist_file_dataset = MNIST.get_file_dataset()  
    mnist_file_dataset.download(data_folder, overwrite=True)  
      
    data_folder = os.path.join(os.getcwd(), 'data')  
    # note we also shrink the intensity values (X) from 0-255 to 0-1. This helps the neural network converge faster  
    X_test = load_data(glob.glob(os.path.join(data_folder,"**/t10k-images-idx3-ubyte.gz"), recursive=True)[0], False) / 255.0  
    y_test = load_data(glob.glob(os.path.join(data_folder,"**/t10k-labels-idx1-ubyte.gz"), recursive=True)[0], True).reshape(-1)  
      
      
      
      
    import json  
    from azureml.core import Datastore  
    from azureml.core.dataset import Dataset  
    from azureml.data import dataset_type_definitions  
      
    random_index = np.random.randint(0, len(X_test)-1)  
    input_json = "{\"data\": [" + str(list(X_test[random_index])) + "]}"  
    # create a string that can be utf-8 encoded and  
    # put in the body of the request  
    serialized_input_json = json.dumps(input_json)  
    dataset_content = []  
    for i in range(100):  
        dataset_content.append(serialized_input_json)  
    dataset_content = '\n'.join(dataset_content)  
    file_name = 'sample_request_data.txt'  
    f = open(file_name, 'w')  
    f.write(dataset_content)  
    f.close()  
      
    # upload the txt file created above to the Datastore and create a dataset from it  
    data_store = Datastore.get_default(ws)  
    data_store.upload_files(['./' + file_name], target_path='sample_request_data')  
    datastore_path = [(data_store, 'sample_request_data' +'/' + file_name)]  
    sample_request_data = Dataset.Tabular.from_delimited_files(  
        datastore_path, separator='\n',  
        infer_column_types=True,  
        header=dataset_type_definitions.PromoteHeadersBehavior.NO_HEADERS)  
    sample_request_data = sample_request_data.register(workspace=ws,  
                                                       name='sample_request_data',  
                                                       create_new_version=True)  
      
      
      
    from azureml.core.model import InferenceConfig, Model  
    from azureml.core.dataset import Dataset  
    from datetime import datetime  
      
      
    model = Model(ws, id='sklearn_mnist:1')  
    inference_config = InferenceConfig(entry_script='score.py', environment=env)  
    input_dataset = Dataset.get_by_name(workspace=ws, name='sample_request_data')  
    profile = Model.profile(ws,  
                'sklearn-%s' % datetime.now().strftime('%m%d%Y-%H%M%S'),  
                [model],  
                inference_config,  
                input_dataset=input_dataset)  
      
    profile.wait_for_completion(True)  
      
    # see the result  
    details = profile.get_details()  
      
  
  
Azure Machine Learning
0 comments No comments

1 answer

Sort by: Most helpful
  1. Ramr-msft 17,836 Reputation points
    2022-08-18T12:23:20.69+00:00

    @JIren Thanks for the question. Can you please add more details about the azure ML SDK version that you are using?
    currently, Profile your model to determine resource utilization applies to CLI v1 and SDK v1. This profiling technique is not available for v2 of either CLI or SDK.
    It’s possible that you have uncaught exceptions in your init() function that is triggering the CrashLoopBackOff error. Did you inspect the docker logs for details?
    https://learn.microsoft.com/en-us/azure/machine-learning/how-to-troubleshoot-deployment#service-launch-fails

    Was this answer helpful?

    0 comments No comments

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.