Multiple new errors when deploying to ACI webservice

JK2 11 Reputation points
2022-03-14T14:39:00.933+00:00

I've deployed an ACI webservice a few months ago on Azure ML. The scoring script references an object in a container in Azure blob, and everything worked fine. I now want to increate the size of the deployment config, and tried redeploying. It failed, and I don't know what changed. I didn't touch the code or notebooks. It's instantly failing, 5 seconds into running Model.deploy..

I first get a

WebserviceException:
Message: Service deployment polling reached non-successful terminal state, current service state: Unhealthy

Then clicking on more info, I get: The specified blob does not exist. RequestId:f2302ade-901e-0014-667b-37d663000000 Time:2022-03-14T08:14:42.7864874Z

There's also this error in the logs: Reason: Container registry 0309abcc70a24ee8921ea4b9f73c3e96.azurecr.io not found. Should it not create a new container registry and a new Docker image if one does not exist? It did the many times I deployed a service before. Why is it still referencing an old container? It should have created a new one...

The blob definitely exists and it worked before. The error doesn't make any sense. It's not a code issue. I even tried removing every registered model, endpoint, and even regenerated access keys and recreated the containers and blobs; same error. I also removed the old container registry. Also nothing.

It doesn't even work on a LocalWebService. Again, nothing's changed in the scoring script or deployment notebook I had...

Entire error:

Tips: You can try get_logs(): https://aka.ms/debugimage#dockerlog or local deployment: https://aka.ms/debugimage#debug-locally to debug if deployment takes longer than 10 minutes.
FailedService deployment polling reached non-successful terminal state, current service state: Unhealthy
Operation ID: 61feee21-ae87-4ab3-a973-eeaa8124011c
More information can be found here: https://wstextanalytic6896936843.blob.core.windows.net/azureml/ImageLogs/61feee21-ae87-4ab3-a973-eeaa8124011c/build.log?sv=2019-07-07&sr=b&sig=405cun7a1PV5afij4KfU0fYvCxp18IHIzB%2BvA1c1wpI%3D&st=2022-03-14T09%3A49%3A48Z&se=2022-03-14T17%3A54%3A48Z&sp=r
Error:
{
  "code": "AciDeploymentFailed",
  "statusCode": 404,
  "message": "No definition exists for Environment with Name: textanalytics Version: Autosave_2022-03-14T09:29:27Z_5e5728c1 Reason: Container registry 0309abcc70a24ee8921ea4b9f73c3e96.azurecr.io not found. If private link is enabled in workspace, please verify ACR is part of private link and retry..",
  "details": []
}

---------------------------------------------------------------------------
WebserviceException                       Traceback (most recent call last)
/tmp/ipykernel_32412/349779865.py in <module>
      8 )
      9 
---> 10 service.wait_for_deployment(show_output=True)
     11 print(service.state)

/anaconda/envs/azureml_py38/lib/python3.8/site-packages/azureml/core/webservice/webservice.py in wait_for_deployment(self, show_output, timeout_sec)
    917                     logs_response = 'Current sub-operation type not known, more logs unavailable.'
    918 
--> 919                 raise WebserviceException('Service deployment polling reached non-successful terminal state, current '
    920                                           'service state: {}\n'
    921                                           'Operation ID: {}\n'

WebserviceException: WebserviceException:
 Message: Service deployment polling reached non-successful terminal state, current service state: Unhealthy
Operation ID: 61feee21-ae87-4ab3-a973-eeaa8124011c
More information can be found here: https://wstextanalytic6896936843.blob.core.windows.net/azureml/ImageLogs/61feee21-ae87-4ab3-a973-eeaa8124011c/build.log?sv=2019-07-07&sr=b&sig=405cun7a1PV5afij4KfU0fYvCxp18IHIzB%2BvA1c1wpI%3D&st=2022-03-14T09%3A49%3A48Z&se=2022-03-14T17%3A54%3A48Z&sp=r
Error:
{
  "code": "AciDeploymentFailed",
  "statusCode": 404,
  "message": "No definition exists for Environment with Name: textanalytics Version: Autosave_2022-03-14T09:29:27Z_5e5728c1 Reason: Container registry 0309abcc70a24ee8921ea4b9f73c3e96.azurecr.io not found. If private link is enabled in workspace, please verify ACR is part of private link and retry..",
  "details": []
}
 InnerException None
 ErrorResponse 
{
    "error": {
        "message": "Service deployment polling reached non-successful terminal state, current service state: Unhealthy\nOperation ID: 61feee21-ae87-4ab3-a973-eeaa8124011c\nMore information can be found here: https://wstextanalytic6896936843.blob.core.windows.net/azureml/ImageLogs/61feee21-ae87-4ab3-a973-eeaa8124011c/build.log?sv=2019-07-07&sr=b&sig=405cun7a1PV5afij4KfU0fYvCxp18IHIzB%2BvA1c1wpI%3D&st=2022-03-14T09%3A49%3A48Z&se=2022-03-14T17%3A54%3A48Z&sp=r\nError:\n{\n  \"code\": \"AciDeploymentFailed\",\n  \"statusCode\": 404,\n  \"message\": \"No definition exists for Environment with Name: textanalytics Version: Autosave_2022-03-14T09:29:27Z_5e5728c1 Reason: Container registry 0309abcc70a24ee8921ea4b9f73c3e96.azurecr.io not found. If private link is enabled in workspace, please verify ACR is part of private link and retry..\",\n  \"details\": []\n}"
    }
}

EDIT: I removed everything. Recreated EVERYTHING. And now, somehow the entry script is wrong when I didn't touch it. Someone please help here so that I move on from this service already.

{
  "code": "AciDeploymentFailed",
  "statusCode": 400,
  "message": "Aci Deployment failed with exception: Error in entry script, ImportError: cannot import name 'ParamSpec', please run print(service.get_logs()) to get details.",
  "details": [
    {
      "code": "CrashLoopBackOff",
      "message": "Error in entry script, ImportError: cannot import name 'ParamSpec', please run print(service.get_logs()) to get details."
    }
  ]
}
 InnerException None
 ErrorResponse 
{
    "error": {
        "message": "Service deployment polling reached non-successful terminal state, current service state: Unhealthy\nOperation ID: ffaab603-0358-4b87-b1c9-8e5ee3390bf7\nMore information can be found using '.get_logs()'\nError:\n{\n  \"code\": \"AciDeploymentFailed\",\n  \"statusCode\": 400,\n  \"message\": \"Aci Deployment failed with exception: Error in entry script, ImportError: cannot import name 'ParamSpec', please run print(service.get_logs()) to get details.\",\n  \"details\": [\n    {\n      \"code\": \"CrashLoopBackOff\",\n      \"message\": \"Error in entry script, ImportError: cannot import name 'ParamSpec', please run print(service.get_logs()) to get details.\"\n    }\n  ]\n}"
    }
}

Entire error log when testing on a LocalWebService:

Container Logs:
2022-03-14T14:13:04,088795292+00:00 - rsyslog/run 
2022-03-14T14:13:04,096166698+00:00 - iot-server/run 
2022-03-14T14:13:04,096661705+00:00 - gunicorn/run 
Dynamic Python package installation is disabled.
Starting HTTP server
2022-03-14T14:13:04,096254199+00:00 - nginx/run 
EdgeHubConnectionString and IOTEDGE_IOTHUBHOSTNAME are not set. Exiting...
2022-03-14T14:13:04,184522169+00:00 - iot-server/finish 1 0
2022-03-14T14:13:04,186354496+00:00 - Exit code 1 is normal. Not restarting iot-server.
Starting gunicorn 20.1.0
Listening at: http://127.0.0.1:31311 (14)
Using worker: sync
worker timeout is set to 300
Booting worker with pid: 42
2022-03-14 14:13:04.894347: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /azureml-envs/azureml_ae7237d9cfbf5d852bb84bf47fdf5c24/lib:/azureml-envs/azureml_ae7237d9cfbf5d852bb84bf47fdf5c24/lib:
2022-03-14 14:13:04.894395: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
SPARK_HOME not set. Skipping PySpark Initialization.
Exception in worker process
Traceback (most recent call last):
  File "/azureml-envs/azureml_ae7237d9cfbf5d852bb84bf47fdf5c24/lib/python3.6/site-packages/gunicorn/arbiter.py", line 589, in spawn_worker
    worker.init_process()
  File "/azureml-envs/azureml_ae7237d9cfbf5d852bb84bf47fdf5c24/lib/python3.6/site-packages/gunicorn/workers/base.py", line 134, in init_process
    self.load_wsgi()
  File "/azureml-envs/azureml_ae7237d9cfbf5d852bb84bf47fdf5c24/lib/python3.6/site-packages/gunicorn/workers/base.py", line 146, in load_wsgi
    self.wsgi = self.app.wsgi()
  File "/azureml-envs/azureml_ae7237d9cfbf5d852bb84bf47fdf5c24/lib/python3.6/site-packages/gunicorn/app/base.py", line 67, in wsgi
    self.callable = self.load()
  File "/azureml-envs/azureml_ae7237d9cfbf5d852bb84bf47fdf5c24/lib/python3.6/site-packages/gunicorn/app/wsgiapp.py", line 58, in load
    return self.load_wsgiapp()
  File "/azureml-envs/azureml_ae7237d9cfbf5d852bb84bf47fdf5c24/lib/python3.6/site-packages/gunicorn/app/wsgiapp.py", line 48, in load_wsgiapp
    return util.import_app(self.app_uri)
  File "/azureml-envs/azureml_ae7237d9cfbf5d852bb84bf47fdf5c24/lib/python3.6/site-packages/gunicorn/util.py", line 359, in import_app
    mod = importlib.import_module(module)
  File "/azureml-envs/azureml_ae7237d9cfbf5d852bb84bf47fdf5c24/lib/python3.6/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 978, in _gcd_import
  File "<frozen importlib._bootstrap>", line 961, in _find_and_load
  File "<frozen importlib._bootstrap>", line 950, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 655, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 678, in exec_module
  File "<frozen importlib._bootstrap>", line 205, in _call_with_frames_removed
  File "/var/azureml-server/entry.py", line 1, in <module>
    import create_app
  File "/var/azureml-server/create_app.py", line 4, in <module>
    from routes_common import main
  File "/var/azureml-server/routes_common.py", line 32, in <module>
    from aml_blueprint import AMLBlueprint
  File "/var/azureml-server/aml_blueprint.py", line 28, in <module>
    main_module_spec.loader.exec_module(main)
  File "/var/azureml-app/arabic_sentiment/score.py", line 7, in <module>
    from azureml.core.model import Model
  File "/azureml-envs/azureml_ae7237d9cfbf5d852bb84bf47fdf5c24/lib/python3.6/site-packages/azureml/core/__init__.py", line 13, in <module>
    from .workspace import Workspace
  File "/azureml-envs/azureml_ae7237d9cfbf5d852bb84bf47fdf5c24/lib/python3.6/site-packages/azureml/core/workspace.py", line 22, in <module>
    from azureml._project import _commands
  File "/azureml-envs/azureml_ae7237d9cfbf5d852bb84bf47fdf5c24/lib/python3.6/site-packages/azureml/_project/_commands.py", line 29, in <module>
    from azure.mgmt.resource import ResourceManagementClient
  File "/azureml-envs/azureml_ae7237d9cfbf5d852bb84bf47fdf5c24/lib/python3.6/site-packages/azure/mgmt/resource/__init__.py", line 9, in <module>
    from .managedapplications import ApplicationClient
  File "/azureml-envs/azureml_ae7237d9cfbf5d852bb84bf47fdf5c24/lib/python3.6/site-packages/azure/mgmt/resource/managedapplications/__init__.py", line 9, in <module>
    from ._application_client import ApplicationClient
  File "/azureml-envs/azureml_ae7237d9cfbf5d852bb84bf47fdf5c24/lib/python3.6/site-packages/azure/mgmt/resource/managedapplications/_application_client.py", line 18, in <module>
    from .operations import ApplicationClientOperationsMixin, ApplicationDefinitionsOperations, ApplicationsOperations
  File "/azureml-envs/azureml_ae7237d9cfbf5d852bb84bf47fdf5c24/lib/python3.6/site-packages/azure/mgmt/resource/managedapplications/operations/__init__.py", line 9, in <module>
    from ._application_client_operations import ApplicationClientOperationsMixin
  File "/azureml-envs/azureml_ae7237d9cfbf5d852bb84bf47fdf5c24/lib/python3.6/site-packages/azure/mgmt/resource/managedapplications/operations/_application_client_operations.py", line 17, in <module>
    from azure.core.tracing.decorator import distributed_trace
  File "/azureml-envs/azureml_ae7237d9cfbf5d852bb84bf47fdf5c24/lib/python3.6/site-packages/azure/core/tracing/decorator.py", line 31, in <module>
    from typing_extensions import ParamSpec
ImportError: cannot import name 'ParamSpec'
Worker exiting (pid: 42)
Shutting down: Master
Reason: Worker failed to boot.
2022-03-14T14:13:06,839815591+00:00 - gunicorn/finish 3 0
2022-03-14T14:13:06,841371413+00:00 - Exit code 3 is not normal. Killing image.

Error: Container has crashed. Did your init method fail?
Azure Machine Learning
Azure Machine Learning
An Azure machine learning service for building and deploying models.
2,729 questions
{count} vote

1 answer

Sort by: Most helpful
  1. JK2 11 Reputation points
    2022-03-15T06:24:24.273+00:00

    More error logs (can't add in comments):

    Error:  
    {  
      "code": "AciDeploymentFailed",  
      "statusCode": 400,  
      "message": "Aci Deployment failed with exception: Your container application crashed. This may be caused by errors in your scoring file's init() function.  
    	1. Please check the logs for your container instance: aciservicesentimentar. From the AML SDK, you can run print(service.get_logs()) if you have service object to fetch the logs.  
    	2. You can interactively debug your scoring file locally. Please refer to https://learn.microsoft.com/azure/machine-learning/how-to-debug-visual-studio-code#debug-and-troubleshoot-deployments for more information.  
    	3. You can also try to run image f75691feb45c4b3c9a0d73442a23d99e.azurecr.io/azureml/azureml_5df2da553c194ecec033f24d7523db5f locally. Please refer to https://aka.ms/debugimage#service-launch-fails for more information.",  
      "details": [  
        {  
          "code": "CrashLoopBackOff",  
          "message": "Your container application crashed. This may be caused by errors in your scoring file's init() function.  
    	1. Please check the logs for your container instance: aciservicesentimentar. From the AML SDK, you can run print(service.get_logs()) if you have service object to fetch the logs.  
    	2. You can interactively debug your scoring file locally. Please refer to https://learn.microsoft.com/azure/machine-learning/how-to-debug-visual-studio-code#debug-and-troubleshoot-deployments for more information.  
    	3. You can also try to run image f75691feb45c4b3c9a0d73442a23d99e.azurecr.io/azureml/azureml_5df2da553c194ecec033f24d7523db5f locally. Please refer to https://aka.ms/debugimage#service-launch-fails for more information."  
        },  
        {  
          "code": "AciDeploymentFailed",  
          "message": "Your container application crashed. Please follow the steps to debug:  
    	1. From the AML SDK, you can run print(service.get_logs()) if you have service object to fetch the logs. Please refer to https://aka.ms/debugimage#dockerlog for more information.  
    	2. If your container application crashed. This may be caused by errors in your scoring file's init() function. You can try debugging locally first. Please refer to https://aka.ms/debugimage#debug-locally for more information.  
    	3. You can also interactively debug your scoring file locally. Please refer to https://learn.microsoft.com/azure/machine-learning/how-to-debug-visual-studio-code#debug-and-troubleshoot-deployments for more information.  
    	4. View the diagnostic events to check status of container, it may help you to debug the issue.  
    "RestartCount": 3  
    "CurrentState": {"state":"Waiting","startTime":null,"exitCode":null,"finishTime":null,"detailStatus":"CrashLoopBackOff: Back-off restarting failed"}  
    "PreviousState": {"state":"Terminated","startTime":"2022-03-14T19:38:03.337Z","exitCode":111,"finishTime":"2022-03-14T19:38:11.513Z","detailStatus":"Error"}  
    "Events": null  
    "  
        }  
      ]  
    }