I've deployed an ACI webservice a few months ago on Azure ML. The scoring script references an object in a container in Azure blob, and everything worked fine. I now want to increate the size of the deployment config, and tried redeploying. It failed, and I don't know what changed. I didn't touch the code or notebooks. It's instantly failing, 5 seconds into running Model.deploy..
I first get a
WebserviceException:
Message: Service deployment polling reached non-successful terminal state, current service state: Unhealthy
Then clicking on more info, I get: The specified blob does not exist. RequestId:f2302ade-901e-0014-667b-37d663000000 Time:2022-03-14T08:14:42.7864874Z
There's also this error in the logs: Reason: Container registry 0309abcc70a24ee8921ea4b9f73c3e96.azurecr.io not found
. Should it not create a new container registry and a new Docker image if one does not exist? It did the many times I deployed a service before. Why is it still referencing an old container? It should have created a new one...
The blob definitely exists and it worked before. The error doesn't make any sense. It's not a code issue. I even tried removing every registered model, endpoint, and even regenerated access keys and recreated the containers and blobs; same error. I also removed the old container registry. Also nothing.
It doesn't even work on a LocalWebService. Again, nothing's changed in the scoring script or deployment notebook I had...
Entire error:
Tips: You can try get_logs(): https://aka.ms/debugimage#dockerlog or local deployment: https://aka.ms/debugimage#debug-locally to debug if deployment takes longer than 10 minutes.
FailedService deployment polling reached non-successful terminal state, current service state: Unhealthy
Operation ID: 61feee21-ae87-4ab3-a973-eeaa8124011c
More information can be found here: https://wstextanalytic6896936843.blob.core.windows.net/azureml/ImageLogs/61feee21-ae87-4ab3-a973-eeaa8124011c/build.log?sv=2019-07-07&sr=b&sig=405cun7a1PV5afij4KfU0fYvCxp18IHIzB%2BvA1c1wpI%3D&st=2022-03-14T09%3A49%3A48Z&se=2022-03-14T17%3A54%3A48Z&sp=r
Error:
{
"code": "AciDeploymentFailed",
"statusCode": 404,
"message": "No definition exists for Environment with Name: textanalytics Version: Autosave_2022-03-14T09:29:27Z_5e5728c1 Reason: Container registry 0309abcc70a24ee8921ea4b9f73c3e96.azurecr.io not found. If private link is enabled in workspace, please verify ACR is part of private link and retry..",
"details": []
}
---------------------------------------------------------------------------
WebserviceException Traceback (most recent call last)
/tmp/ipykernel_32412/349779865.py in <module>
8 )
9
---> 10 service.wait_for_deployment(show_output=True)
11 print(service.state)
/anaconda/envs/azureml_py38/lib/python3.8/site-packages/azureml/core/webservice/webservice.py in wait_for_deployment(self, show_output, timeout_sec)
917 logs_response = 'Current sub-operation type not known, more logs unavailable.'
918
--> 919 raise WebserviceException('Service deployment polling reached non-successful terminal state, current '
920 'service state: {}\n'
921 'Operation ID: {}\n'
WebserviceException: WebserviceException:
Message: Service deployment polling reached non-successful terminal state, current service state: Unhealthy
Operation ID: 61feee21-ae87-4ab3-a973-eeaa8124011c
More information can be found here: https://wstextanalytic6896936843.blob.core.windows.net/azureml/ImageLogs/61feee21-ae87-4ab3-a973-eeaa8124011c/build.log?sv=2019-07-07&sr=b&sig=405cun7a1PV5afij4KfU0fYvCxp18IHIzB%2BvA1c1wpI%3D&st=2022-03-14T09%3A49%3A48Z&se=2022-03-14T17%3A54%3A48Z&sp=r
Error:
{
"code": "AciDeploymentFailed",
"statusCode": 404,
"message": "No definition exists for Environment with Name: textanalytics Version: Autosave_2022-03-14T09:29:27Z_5e5728c1 Reason: Container registry 0309abcc70a24ee8921ea4b9f73c3e96.azurecr.io not found. If private link is enabled in workspace, please verify ACR is part of private link and retry..",
"details": []
}
InnerException None
ErrorResponse
{
"error": {
"message": "Service deployment polling reached non-successful terminal state, current service state: Unhealthy\nOperation ID: 61feee21-ae87-4ab3-a973-eeaa8124011c\nMore information can be found here: https://wstextanalytic6896936843.blob.core.windows.net/azureml/ImageLogs/61feee21-ae87-4ab3-a973-eeaa8124011c/build.log?sv=2019-07-07&sr=b&sig=405cun7a1PV5afij4KfU0fYvCxp18IHIzB%2BvA1c1wpI%3D&st=2022-03-14T09%3A49%3A48Z&se=2022-03-14T17%3A54%3A48Z&sp=r\nError:\n{\n \"code\": \"AciDeploymentFailed\",\n \"statusCode\": 404,\n \"message\": \"No definition exists for Environment with Name: textanalytics Version: Autosave_2022-03-14T09:29:27Z_5e5728c1 Reason: Container registry 0309abcc70a24ee8921ea4b9f73c3e96.azurecr.io not found. If private link is enabled in workspace, please verify ACR is part of private link and retry..\",\n \"details\": []\n}"
}
}
EDIT: I removed everything. Recreated EVERYTHING. And now, somehow the entry script is wrong when I didn't touch it. Someone please help here so that I move on from this service already.
{
"code": "AciDeploymentFailed",
"statusCode": 400,
"message": "Aci Deployment failed with exception: Error in entry script, ImportError: cannot import name 'ParamSpec', please run print(service.get_logs()) to get details.",
"details": [
{
"code": "CrashLoopBackOff",
"message": "Error in entry script, ImportError: cannot import name 'ParamSpec', please run print(service.get_logs()) to get details."
}
]
}
InnerException None
ErrorResponse
{
"error": {
"message": "Service deployment polling reached non-successful terminal state, current service state: Unhealthy\nOperation ID: ffaab603-0358-4b87-b1c9-8e5ee3390bf7\nMore information can be found using '.get_logs()'\nError:\n{\n \"code\": \"AciDeploymentFailed\",\n \"statusCode\": 400,\n \"message\": \"Aci Deployment failed with exception: Error in entry script, ImportError: cannot import name 'ParamSpec', please run print(service.get_logs()) to get details.\",\n \"details\": [\n {\n \"code\": \"CrashLoopBackOff\",\n \"message\": \"Error in entry script, ImportError: cannot import name 'ParamSpec', please run print(service.get_logs()) to get details.\"\n }\n ]\n}"
}
}
Entire error log when testing on a LocalWebService:
Container Logs:
2022-03-14T14:13:04,088795292+00:00 - rsyslog/run
2022-03-14T14:13:04,096166698+00:00 - iot-server/run
2022-03-14T14:13:04,096661705+00:00 - gunicorn/run
Dynamic Python package installation is disabled.
Starting HTTP server
2022-03-14T14:13:04,096254199+00:00 - nginx/run
EdgeHubConnectionString and IOTEDGE_IOTHUBHOSTNAME are not set. Exiting...
2022-03-14T14:13:04,184522169+00:00 - iot-server/finish 1 0
2022-03-14T14:13:04,186354496+00:00 - Exit code 1 is normal. Not restarting iot-server.
Starting gunicorn 20.1.0
Listening at: http://127.0.0.1:31311 (14)
Using worker: sync
worker timeout is set to 300
Booting worker with pid: 42
2022-03-14 14:13:04.894347: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /azureml-envs/azureml_ae7237d9cfbf5d852bb84bf47fdf5c24/lib:/azureml-envs/azureml_ae7237d9cfbf5d852bb84bf47fdf5c24/lib:
2022-03-14 14:13:04.894395: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
SPARK_HOME not set. Skipping PySpark Initialization.
Exception in worker process
Traceback (most recent call last):
File "/azureml-envs/azureml_ae7237d9cfbf5d852bb84bf47fdf5c24/lib/python3.6/site-packages/gunicorn/arbiter.py", line 589, in spawn_worker
worker.init_process()
File "/azureml-envs/azureml_ae7237d9cfbf5d852bb84bf47fdf5c24/lib/python3.6/site-packages/gunicorn/workers/base.py", line 134, in init_process
self.load_wsgi()
File "/azureml-envs/azureml_ae7237d9cfbf5d852bb84bf47fdf5c24/lib/python3.6/site-packages/gunicorn/workers/base.py", line 146, in load_wsgi
self.wsgi = self.app.wsgi()
File "/azureml-envs/azureml_ae7237d9cfbf5d852bb84bf47fdf5c24/lib/python3.6/site-packages/gunicorn/app/base.py", line 67, in wsgi
self.callable = self.load()
File "/azureml-envs/azureml_ae7237d9cfbf5d852bb84bf47fdf5c24/lib/python3.6/site-packages/gunicorn/app/wsgiapp.py", line 58, in load
return self.load_wsgiapp()
File "/azureml-envs/azureml_ae7237d9cfbf5d852bb84bf47fdf5c24/lib/python3.6/site-packages/gunicorn/app/wsgiapp.py", line 48, in load_wsgiapp
return util.import_app(self.app_uri)
File "/azureml-envs/azureml_ae7237d9cfbf5d852bb84bf47fdf5c24/lib/python3.6/site-packages/gunicorn/util.py", line 359, in import_app
mod = importlib.import_module(module)
File "/azureml-envs/azureml_ae7237d9cfbf5d852bb84bf47fdf5c24/lib/python3.6/importlib/__init__.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 978, in _gcd_import
File "<frozen importlib._bootstrap>", line 961, in _find_and_load
File "<frozen importlib._bootstrap>", line 950, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 655, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 678, in exec_module
File "<frozen importlib._bootstrap>", line 205, in _call_with_frames_removed
File "/var/azureml-server/entry.py", line 1, in <module>
import create_app
File "/var/azureml-server/create_app.py", line 4, in <module>
from routes_common import main
File "/var/azureml-server/routes_common.py", line 32, in <module>
from aml_blueprint import AMLBlueprint
File "/var/azureml-server/aml_blueprint.py", line 28, in <module>
main_module_spec.loader.exec_module(main)
File "/var/azureml-app/arabic_sentiment/score.py", line 7, in <module>
from azureml.core.model import Model
File "/azureml-envs/azureml_ae7237d9cfbf5d852bb84bf47fdf5c24/lib/python3.6/site-packages/azureml/core/__init__.py", line 13, in <module>
from .workspace import Workspace
File "/azureml-envs/azureml_ae7237d9cfbf5d852bb84bf47fdf5c24/lib/python3.6/site-packages/azureml/core/workspace.py", line 22, in <module>
from azureml._project import _commands
File "/azureml-envs/azureml_ae7237d9cfbf5d852bb84bf47fdf5c24/lib/python3.6/site-packages/azureml/_project/_commands.py", line 29, in <module>
from azure.mgmt.resource import ResourceManagementClient
File "/azureml-envs/azureml_ae7237d9cfbf5d852bb84bf47fdf5c24/lib/python3.6/site-packages/azure/mgmt/resource/__init__.py", line 9, in <module>
from .managedapplications import ApplicationClient
File "/azureml-envs/azureml_ae7237d9cfbf5d852bb84bf47fdf5c24/lib/python3.6/site-packages/azure/mgmt/resource/managedapplications/__init__.py", line 9, in <module>
from ._application_client import ApplicationClient
File "/azureml-envs/azureml_ae7237d9cfbf5d852bb84bf47fdf5c24/lib/python3.6/site-packages/azure/mgmt/resource/managedapplications/_application_client.py", line 18, in <module>
from .operations import ApplicationClientOperationsMixin, ApplicationDefinitionsOperations, ApplicationsOperations
File "/azureml-envs/azureml_ae7237d9cfbf5d852bb84bf47fdf5c24/lib/python3.6/site-packages/azure/mgmt/resource/managedapplications/operations/__init__.py", line 9, in <module>
from ._application_client_operations import ApplicationClientOperationsMixin
File "/azureml-envs/azureml_ae7237d9cfbf5d852bb84bf47fdf5c24/lib/python3.6/site-packages/azure/mgmt/resource/managedapplications/operations/_application_client_operations.py", line 17, in <module>
from azure.core.tracing.decorator import distributed_trace
File "/azureml-envs/azureml_ae7237d9cfbf5d852bb84bf47fdf5c24/lib/python3.6/site-packages/azure/core/tracing/decorator.py", line 31, in <module>
from typing_extensions import ParamSpec
ImportError: cannot import name 'ParamSpec'
Worker exiting (pid: 42)
Shutting down: Master
Reason: Worker failed to boot.
2022-03-14T14:13:06,839815591+00:00 - gunicorn/finish 3 0
2022-03-14T14:13:06,841371413+00:00 - Exit code 3 is not normal. Killing image.
Error: Container has crashed. Did your init method fail?