Error 404: AciDeploymentFailed

Florian Surmont 1 Reputation point
2021-10-25T18:40:34.1+00:00

Hello,

I am trying to deploy a machine learning model through an ACI (Azure Container Instances) service. I am working in Python and I followed the following code (from the official documentation : https://learn.microsoft.com/en-us/azure/machine-learning/how-to-deploy-and-where?tabs=azcli) :

from azureml.core import Workspace

from azureml.core import Workspace  

# Connect to workspace  
ws = Workspace(subscription_id="my-subscription-id",  
               resource_group="my-ressource-group-name",  
               workspace_name="my-workspace-name")  

from azureml.core.model import Model  

model = Model.register(workspace = ws,  
                       model_path= 'model.pkl',  
                       model_name = 'my-model',  
                       description = 'my-description')  



%%writefile score.py  

import os  
import dill  
import joblib  

def init():  
    global model  
    # Get the path where the deployed model can be found  
    model_path = os.getenv('AZUREML_MODEL_DIR')  

    # Load existing model  
    model = joblib.load('model.pkl')  

# Handle request to the service  
def run(data):  
    try:  
        # Pick out the text property of the JSON request  
        # Expected JSON details {"text": "some text to evaluate"}  
        data = json.loads(data)  
        prediction = model.predict(data['text'])  
        return prediction  
    except Exception as e:  
        error = str(e)  
        return error  


from azureml.core.environment import Environment  

# Name environment and call requirements file  
# requirements: numpy, tensorflow  
myenv = Environment.from_pip_requirements(name = 'myenv', file_path = 'requirements.txt')  

from azureml.core.model import InferenceConfig  

# Create inference configuration  
inference_config = InferenceConfig(environment=myenv, entry_script='score.py')  

from azureml.core.webservice import AciWebservice #AksWebservice  

# Set the virtual machine capabilities  
deployment_config = AciWebservice.deploy_configuration(cpu_cores = 0.5, memory_gb = 3)  


from azureml.core.model import Model  

# Deploy ML model (Azure Container Instances)  
service = Model.deploy(workspace=ws,  
                       name='my-service-name',  
                       models=[model],  
                       inference_config=inference_config,  
                       deployment_config=deployment_config)  

service.wait_for_deployment(show_output = True)  

I succeded once with the previous code. I noticed that the Model.deploy created a container registry with a specific name 6e07ce2cc4ac4838b42d35cda8d38616.
The API was working well and I wanted to deploy an other model from scratch. I deleted the service and model from Azure ML Studio and the container registry from Azure ressources.

Unfortunately I am not able to deploy again anything.

For the last step (the Model.deploy step), I have the following error message :

Service deployment polling reached non-successful terminal state, current service state: Unhealthy
Operation ID: 46243f9b-3833-4650-8d47-3ac54a39dc5e
More information can be found here: https://machinelearnin2812599115.blob.core.windows.net/azureml/ImageLogs/46245f8b-3833-4659-8d47-3ac54a39dc5e/build.log?sv=2019-07-07&sr=b&sig=45kgNS4sbSZrQH%2Fp29Rhxzb7qC5Nf1hJ%2BLbRDpXJolk%3D&st=2021-10-25T17%3A20%3A49Z&se=2021-10-27T01%3A24%3A49Z&sp=r
Error:
{
"code": "AciDeploymentFailed",
"statusCode": 404,
"message": "No definition exists for Environment with Name: myenv Version: Autosave_2021-10-25T17:24:43Z_b1d066bf Reason: Container > registry 6e07ce2cc4ac4838b42d35cda8d38616.azurecr.io not found. If private link is enabled in workspace, please verify ACR is part of private > link and retry..",
"details": []
}

I do not understand why the first time a new container registry was well created, but now it seems that it is sought (the message is saying that container registry identified by name 6e07ce2cc4ac4838b42d35cda8d38616 is missing). I never found where I can force the creation of a new container registry ressource in Python, neither specify a name for it in AciWebservice.deploy_configuration or Model.deploy.

I tried to create the container registry by hand, but this time, this is the container that cannot be created. The output is the folloiwing :

Tips: You can try get_logs(): https://aka.ms/debugimage#dockerlog or local deployment: https://aka.ms/debugimage#debug-locally to debug if deployment takes longer than 10 minutes.
Running
2021-10-25 19:25:10+02:00 Creating Container Registry if not exists.
2021-10-25 19:25:10+02:00 Registering the environment.
2021-10-25 19:25:13+02:00 Building image..
2021-10-25 19:30:45+02:00 Generating deployment configuration.
2021-10-25 19:30:46+02:00 Submitting deployment to compute.
Failed

Service deployment polling reached non-successful terminal state, current service state: Unhealthy
Operation ID: 93780de6-7662-40d8-ab9e-4e1556ef880f
Current sub-operation type not known, more logs unavailable.
Error:
{
"code": "InaccessibleImage",
"statusCode": 400,
"message": "ACI Service request failed. Reason: The image '6e07ce2cc4ac4838b42d35cda8d38616.azurecr.io/azureml/azureml_684133370d8916c87f6230d213976ca5' in container group 'my-service-name-LM4HbqzEBEi0LTXNqNOGFQ' is not accessible. Please check the image and registry credential.. Refer to https://learn.microsoft.com/azure/container-registry/container-registry-authentication#admin-account and make sure Admin user is enabled for your container registry."
}

I tried to follow the recommandation of the last message saying to set Admin user enabled for the container registry. Unfortunately the same error message appears again and I am stuck here...

Does anyone could help me omving on with this? The best solution would be I think to delete totally this 6e07ce2cc4ac4838b42d35cda8d38616 container registry but I can't find where the reference is set so Model.deploy always fall to find it.

An other solution would be to force Model.deploy to generate a new container registry, but I could find how to make that.

I need your help !

Azure Container Registry
Azure Container Registry
An Azure service that provides a registry of Docker and Open Container Initiative images.
387 questions
Azure Container Instances
Azure Container Instances
An Azure service that provides customers with a serverless container experience.
635 questions
Azure Machine Learning
Azure Machine Learning
An Azure machine learning service for building and deploying models.
2,563 questions
0 comments No comments
{count} votes

2 answers

Sort by: Most helpful
  1. romungi-MSFT 41,961 Reputation points Microsoft Employee
    2021-10-26T09:08:27.543+00:00

    @Florian Surmont The azure container registry is actually created while creating your workspace. Deleting the container registry or any of its dependent resources like storage account, Keyvault and app insights will actually cause the workspace to behave inconsistently.

    Since you have already deleted the registry and tried to attach a new one, it looks like the keys of this dependent resource are not synced with the workspace. Ideally during create of a workspace you can use an existing registry with the following command.

    az ml workspace create -w <workspace-name>  
                           -g <resource-group-name>  
                           --container-registry "/subscriptions/<service-GUID>/resourceGroups/<resource-group-name>/providers/Microsoft.ContainerRegistry/registries/<acr-name>"  
    

    Since the workspace is already available you can use the az ml workspace update command instead to set the registry and then sync the keys.

     az ml workspace sync-keys -w <workspace-name> -g <resource-group-name>  
    

    I have worked with another user with a similar issue before but they did not enable admin access on registry. Since you have already done so, I think the above steps should help to sync the registry with your workspace and you can try to create or update a model from your experiment.

    I hope this can help.

    If an answer is helpful, please click on 130616-image.png or upvote 130671-image.png which might help other community members reading this thread.

    0 comments No comments

  2. Florian Surmont 1 Reputation point
    2021-10-26T10:29:45.23+00:00

    Thank you very much for your answer. I do understand when I made my mistake, and what happens, thank you.

    Unfortunately I am not able to run the command you suggest:

     az ml workspace sync-keys -w machinelearning -g oc-ingenieur-ia
    

    Indeed, I am not able to find <workspace-name> from the CLI. The error message is the following:

    ProjectSystemException:
    Message: Workspace not found.
    InnerException None
    ErrorResponse
    {
    "error": {
    "message": "Workspace not found."
    }
    }

    When I run

    az ml workspace list
    

    The result is empty : [].

    That is quite weird, because I do have it created in the portal web interface.

    When I try to find the resource group name with command:

    az group list --subscription <my-subscription-id>
    

    I do have the result:

    [
    {
    "id": "/subscriptions/<my-subscription-id>/resourceGroups/OC-ingenieur-IA",
    "location": "francecentral",
    "managedBy": null,
    "name": "OC-ingenieur-IA",
    "properties": {
    "provisioningState": "Succeeded"
    },
    "tags": {},
    "type": "Microsoft.Resources/resourceGroups"
    }
    ]

    But when I run either:

    az ml workspace list --resource-group OC-ingenieur-IA
    

    or

    az ml workspace list --resource-group oc-ingenieur-ia
    

    I have the following error:

    ProjectSystemException:
    Message: Workspaces not found.
    InnerException None
    ErrorResponse
    {
    "error": {
    "message": "Workspaces not found."
    }
    }

    Moreover if I try to import the workspace from Python:

    ws = Workspace(subscription_id="my-subscription-id",
                   resource_group="oc-ingenieur-ia",
                   workspace_name="machinelearning")
    

    The Python object iswell created and its name attribute is "machinelearning" as expected. I can interact with it to create a model with Model.register that shows up in Azure portal.

    Any idea of what is going on?