I am trying to deploy an ML classification model on Azure using GUI.
After registering/uploading the model inside the portal, I am deploying the model in the Azure container instance, with custom entry_script and the conda dependencies.
Entry Script
# Importing Pacakges
import pandas as pd
import pickle
import regex, json
import numpy as np
import sklearn
import os
from inference_schema.schema_decorators import input_schema, output_schema
from inference_schema.parameter_types.numpy_parameter_type import NumpyParameterType
def init():
global model
global classes
model_path = os.path.join(os.getenv('AZUREML_MODEL_DIR'), 'randomForest50.pkl')
model = pickle.load(open(model_path, "rb"))
classes = lambda x : ["F", "M"][x]
input_sample = np.array([['Thomas', 'Anna']])
output_sample = np.array(['m', 'F'])
@input_schema('data', NumpyParameterType(input_sample))
@output_schema(NumpyParameterType(output_sample))
def run(data):
try:
namesList = json.loads(data)["data"]["names"]
pred = list(map(classes, model.predict(preprocessing(namesList))))
return str(pred[0])
except Exception as e:
error = str(e)
return error
Conda.yaml
name: prediction
dependencies:
- python=3.7
- numpy
- scikit-learn
- pip:
- azureml-defaults
- pandas
- pickle4
- regex
- inference-schema[numpy-support]
After deployment, the endpoint deployment state goes to unhealthy. and the logs show that program is stuck in a loop. Check logs below:
2021-04-26T08:14:55,433967500+00:00 - rsyslog/run
2021-04-26T08:14:55,421414500+00:00 - iot-server/run
2021-04-26T08:14:55,540534600+00:00 - gunicorn/run
2021-04-26T08:14:55,646209100+00:00 - nginx/run
EdgeHubConnectionString and IOTEDGE_IOTHUBHOSTNAME are not set. Exiting...
2021-04-26T08:14:58,234212800+00:00 - iot-server/finish 1 0
2021-04-26T08:14:58,324505300+00:00 - Exit code 1 is normal. Not restarting iot-server.
Starting gunicorn 19.9.0
Listening at: http://127.0.0.1:31311 (62)
Using worker: sync
worker timeout is set to 300
Booting worker with pid: 89
SPARK_HOME not set. Skipping PySpark Initialization.
Initializing logger
2021-04-26 08:15:11,623 | root | INFO | Starting up app insights client
2021-04-26 08:15:11,624 | root | INFO | Starting up request id generator
2021-04-26 08:15:11,631 | root | INFO | Starting up app insight hooks
2021-04-26 08:15:11,632 | root | INFO | Invoking user's init function
worker timeout is set to 300
Booting worker with pid: 91
SPARK_HOME not set. Skipping PySpark Initialization.
Initializing logger
2021-04-26 08:15:29,014 | root | INFO | Starting up app insights client
2021-04-26 08:15:29,014 | root | INFO | Starting up request id generator
2021-04-26 08:15:29,014 | root | INFO | Starting up app insight hooks
2021-04-26 08:15:29,014 | root | INFO | Invoking user's init function
worker timeout is set to 300
Booting worker with pid: 98
SPARK_HOME not set. Skipping PySpark Initialization.
...
...
...
I tried to deploy the model using python also. But it also failed with message:
WebserviceException: WebserviceException:
Message: Service deployment polling reached non-successful terminal state, current service state: Failed
Operation ID: 98e464d4-5b15-4606-936f-a2625f7bd1fd
More information can be found using '.get_logs()'
Error:
{
"code": "AciDeploymentFailed",
"statusCode": 400,
"message": "Aci Deployment failed with exception: Your container application crashed. This may be caused by errors in your scoring file's init() function.\n\t1. Please check the logs for your container instance: d16. From the AML SDK, you can run print(service.get_logs()) if you have service object to fetch the logs.\n\t2. You can interactively debug your scoring file locally. Please refer to https://learn.microsoft.com/azure/machine-learning/how-to-debug-visual-studio-code#debug-and-troubleshoot-deployments for more information.\n\t3. You can also try to run image 20dd0f745f704eeb89ef4d52057871a0.azurecr.io/azureml/azureml_b9e8a2e66019f74c902eacced9684631 locally. Please refer to https://aka.ms/debugimage#service-launch-fails for more information.",
"details": [
{
"code": "CrashLoopBackOff",
"message": "Your container application crashed. This may be caused by errors in your scoring file's init() function.\n\t1. Please check the logs for your container instance: d16. From the AML SDK, you can run print(service.get_logs()) if you have service object to fetch the logs.\n\t2. You can interactively debug your scoring file locally. Please refer to https://learn.microsoft.com/azure/machine-learning/how-to-debug-visual-studio-code#debug-and-troubleshoot-deployments for more information.\n\t3. You can also try to run image 20dd0f745f704eeb89ef4d52057871a0.azurecr.io/azureml/azureml_b9e8a2e66019f74c902eacced9684631 locally. Please refer to https://aka.ms/debugimage#service-launch-fails for more information."
},
{
"code": "AciDeploymentFailed",
"message": "Your container application crashed. Please follow the steps to debug:\n\t1. From the AML SDK, you can run print(service.get_logs()) if you have service object to fetch the logs. Please refer to https://aka.ms/debugimage#dockerlog for more information.\n\t2. If your container application crashed. This may be caused by errors in your scoring file's init() function. You can try debugging locally first. Please refer to https://aka.ms/debugimage#debug-locally for more information.\n\t3. You can also interactively debug your scoring file locally. Please refer to https://learn.microsoft.com/azure/machine-learning/how-to-debug-visual-studio-code#debug-and-troubleshoot-deployments for more information.\n\t4. View the diagnostic events to check status of container, it may help you to debug the issue.\n\"RestartCount\": 3\n\"CurrentState\": {\"state\":\"Waiting\",\"startTime\":null,\"exitCode\":null,\"finishTime\":null,\"detailStatus\":\"CrashLoopBackOff: Back-off restarting failed\"}\n\"PreviousState\": {\"state\":\"Terminated\",\"startTime\":\"2021-04-27T10:46:03.903Z\",\"exitCode\":111,\"finishTime\":\"2021-04-27T10:46:07.524Z\",\"detailStatus\":\"Error\"}\n\"Events\":\n{\"count\":1,\"firstTimestamp\":\"2021-04-27T10:42:37Z\",\"lastTimestamp\":\"2021-04-27T10:42:37Z\",\"name\":\"Pulling\",\"message\":\"pulling image \\\"20dd0f745f704eeb89ef4d52057871a0.azurecr.io/azureml/azureml_b9e8a2e66019f74c902eacced9684631@sha256:322ebafbe88e98b0f57104fd0afad08a5caf57cc5e7f64b3b629c3ea50f54bb3\\\"\",\"type\":\"Normal\"}\n{\"count\":1,\"firstTimestamp\":\"2021-04-27T10:44:15Z\",\"lastTimestamp\":\"2021-04-27T10:44:15Z\",\"name\":\"Pulled\",\"message\":\"Successfully pulled image \\\"20dd0f745f704eeb89ef4d52057871a0.azurecr.io/azureml/azureml_b9e8a2e66019f74c902eacced9684631@sha256:322ebafbe88e98b0f57104fd0afad08a5caf57cc5e7f64b3b629c3ea50f54bb3\\\"\",\"type\":\"Normal\"}\n{\"count\":4,\"firstTimestamp\":\"2021-04-27T10:44:40Z\",\"lastTimestamp\":\"2021-04-27T10:46:03Z\",\"name\":\"Started\",\"message\":\"Started container\",\"type\":\"Normal\"}\n{\"count\":4,\"firstTimestamp\":\"2021-04-27T10:44:43Z\",\"lastTimestamp\":\"2021-04-27T10:46:07Z\",\"name\":\"Killing\",\"message\":\"Killing container with id 5c5ddb266c4b38b1c306367712d9bec0687e5f6979e34afea7f6b943edf7db75.\",\"type\":\"Normal\"}\n"
}
]
}
InnerException None
ErrorResponse
{
"error": {
"message": "Service deployment polling reached non-successful terminal state, current service state: Failed\nOperation ID: 98e464d4-5b15-4606-936f-a2625f7bd1fd\nMore information can be found using '.get_logs()'\nError:\n{\n \"code\": \"AciDeploymentFailed\",\n \"statusCode\": 400,\n \"message\": \"Aci Deployment failed with exception: Your container application crashed. This may be caused by errors in your scoring file's init() function.\\n\\t1. Please check the logs for your container instance: d16. From the AML SDK, you can run print(service.get_logs()) if you have service object to fetch the logs.\\n\\t2. You can interactively debug your scoring file locally. Please refer to https://learn.microsoft.com/azure/machine-learning/how-to-debug-visual-studio-code#debug-and-troubleshoot-deployments for more information.\\n\\t3. You can also try to run image 20dd0f745f704eeb89ef4d52057871a0.azurecr.io/azureml/azureml_b9e8a2e66019f74c902eacced9684631 locally. Please refer to https://aka.ms/debugimage#service-launch-fails for more information.\",\n \"details\": [\n {\n \"code\": \"CrashLoopBackOff\",\n \"message\": \"Your container application crashed. This may be caused by errors in your scoring file's init() function.\\n\\t1. Please check the logs for your container instance: d16. From the AML SDK, you can run print(service.get_logs()) if you have service object to fetch the logs.\\n\\t2. You can interactively debug your scoring file locally. Please refer to https://learn.microsoft.com/azure/machine-learning/how-to-debug-visual-studio-code#debug-and-troubleshoot-deployments for more information.\\n\\t3. You can also try to run image 20dd0f745f704eeb89ef4d52057871a0.azurecr.io/azureml/azureml_b9e8a2e66019f74c902eacced9684631 locally. Please refer to https://aka.ms/debugimage#service-launch-fails for more information.\"\n },\n {\n \"code\": \"AciDeploymentFailed\",\n \"message\": \"Your container application crashed. Please follow the steps to debug:\\n\\t1. From the AML SDK, you can run print(service.get_logs()) if you have service object to fetch the logs. Please refer to https://aka.ms/debugimage#dockerlog for more information.\\n\\t2. If your container application crashed. This may be caused by errors in your scoring file's init() function. You can try debugging locally first. Please refer to https://aka.ms/debugimage#debug-locally for more information.\\n\\t3. You can also interactively debug your scoring file locally. Please refer to https://learn.microsoft.com/azure/machine-learning/how-to-debug-visual-studio-code#debug-and-troubleshoot-deployments for more information.\\n\\t4. View the diagnostic events to check status of container, it may help you to debug the issue.\\n\\\"RestartCount\\\": 3\\n\\\"CurrentState\\\": {\\\"state\\\":\\\"Waiting\\\",\\\"startTime\\\":null,\\\"exitCode\\\":null,\\\"finishTime\\\":null,\\\"detailStatus\\\":\\\"CrashLoopBackOff: Back-off restarting failed\\\"}\\n\\\"PreviousState\\\": {\\\"state\\\":\\\"Terminated\\\",\\\"startTime\\\":\\\"2021-04-27T10:46:03.903Z\\\",\\\"exitCode\\\":111,\\\"finishTime\\\":\\\"2021-04-27T10:46:07.524Z\\\",\\\"detailStatus\\\":\\\"Error\\\"}\\n\\\"Events\\\":\\n{\\\"count\\\":1,\\\"firstTimestamp\\\":\\\"2021-04-27T10:42:37Z\\\",\\\"lastTimestamp\\\":\\\"2021-04-27T10:42:37Z\\\",\\\"name\\\":\\\"Pulling\\\",\\\"message\\\":\\\"pulling image \\\\\\\"20dd0f745f704eeb89ef4d52057871a0.azurecr.io/azureml/azureml_b9e8a2e66019f74c902eacced9684631@sha256:322ebafbe88e98b0f57104fd0afad08a5caf57cc5e7f64b3b629c3ea50f54bb3\\\\\\\"\\\",\\\"type\\\":\\\"Normal\\\"}\\n{\\\"count\\\":1,\\\"firstTimestamp\\\":\\\"2021-04-27T10:44:15Z\\\",\\\"lastTimestamp\\\":\\\"2021-04-27T10:44:15Z\\\",\\\"name\\\":\\\"Pulled\\\",\\\"message\\\":\\\"Successfully pulled image \\\\\\\"20dd0f745f704eeb89ef4d52057871a0.azurecr.io/azureml/azureml_b9e8a2e66019f74c902eacced9684631@sha256:322ebafbe88e98b0f57104fd0afad08a5caf57cc5e7f64b3b629c3ea50f54bb3\\\\\\\"\\\",\\\"type\\\":\\\"Normal\\\"}\\n{\\\"count\\\":4,\\\"firstTimestamp\\\":\\\"2021-04-27T10:44:40Z\\\",\\\"lastTimestamp\\\":\\\"2021-04-27T10:46:03Z\\\",\\\"name\\\":\\\"Started\\\",\\\"message\\\":\\\"Started container\\\",\\\"type\\\":\\\"Normal\\\"}\\n{\\\"count\\\":4,\\\"firstTimestamp\\\":\\\"2021-04-27T10:44:43Z\\\",\\\"lastTimestamp\\\":\\\"2021-04-27T10:46:07Z\\\",\\\"name\\\":\\\"Killing\\\",\\\"message\\\":\\\"Killing container with id 5c5ddb266c4b38b1c306367712d9bec0687e5f6979e34afea7f6b943edf7db75.\\\",\\\"type\\\":\\\"Normal\\\"}\\n\"\n }\n ]\n}"
}
}
I have deployed the same model with the same entryScript.py and the same conda.yaml previously, and it worked fine.
I cannot figure out what can be the issue here. Can anybody please suggest to me something for solving this?