Deployemnt Time out error in AKS and Endpoint stuck in "Transitioning" state.

Dan Marculescu 6 Reputation points
2021-09-06T05:38:25.21+00:00

Working on the deployment of 170 ML models using ML studio and azure Kubernetes service which is referred on the below doc link "https://github.com/MicrosoftDocs/azure-docs/blob/master/articles/machine-learning/how-to-deploy-azure-kubernetes-service.md".

We are training the model using python script with the custom environment and we are registering the ml model on the Azure ML services. Once we register the mode we are deploying it on the AKS by using the container images.

While deploying the ML model we are able to deploy up to 10 to 11 models per pod for each Node in AKS. When we try to deploy the model on the same node we are getting deployment timeout error and we are getting the below error message.

129464-deployment-error.png

For deploying the model in Azure Kubernetes Service using python language with below sample code.

 #  Create an environment and add conda dependencies to it and for this creating our environment and building the custom container image.  
        myenv = Environment(name = Deployment_name)  
        myenv.python.conda_dependencies = CondaDependencies.create(pip_packages)  
      
          
    #  Inference_Conifiguration  
        inf_config = InferenceConfig(environment= myenv, entry_script='./Script_file.py')  
      
      
    # Deployment_Conifiguration  
        deployment_config = AksWebservice.deploy_configuration(cpu_cores = 1, memory_gb = 1, cpu_cores_limit = 2, memory_gb_limit = 2, traffic_percentile = 10)  
      
    #  AKS cluster compute target   
        aks_target = ComputeTarget(ws, 'pipeline')  
         
      
   #  Deploying the model in AKS server  
          service = Model.deploy(ws, Deployment_name, model_1, inf_config,  
                      deployment_config, aks_target, overwrite=True)  
      
           service.wait_for_deployment(show_output=True)  

We also checked on the azure documentation and we could able to find any configuration or deployment setup for aks nodes.

Can you please provide us more clarification regarding "The number of models to be deployed is limited to 1,000 models per deployment (per container)" and Can you please give insight/feedback on how to increase the number of ml models that can be deployed in each node in Azure Kubernetes Service? Thanks!

Azure Container Registry
Azure Container Registry
An Azure service that provides a registry of Docker and Open Container Initiative images.
414 questions
Azure Machine Learning
Azure Machine Learning
An Azure machine learning service for building and deploying models.
2,657 questions
Azure Kubernetes Service (AKS)
Azure Kubernetes Service (AKS)
An Azure service that provides serverless Kubernetes, an integrated continuous integration and continuous delivery experience, and enterprise-grade security and governance.
1,931 questions
0 comments No comments
{count} vote

1 answer

Sort by: Most helpful
  1. shiva patpi 13,156 Reputation points Microsoft Employee
    2021-09-06T18:02:32.08+00:00

    Hello @Dan Marculescu ,
    Can you kindly take a look at the similar post which was answered with relevant documentation .
    https://learn.microsoft.com/en-us/answers/questions/540001/how-many-models-can-be-deployed-in-single-node-in.html

    Let us know if that helps !

    Regards,
    Shiva.

    1 person found this answer helpful.