I'm building a CI/CD pipeline in Azure DevOps for the deployment of my Machine Learning model to Azure Kubernetes Service. I have the following task in my YAML pipeline file (replaced some of the values with '...'):
- task: AzureCLI@1
displayName: "Deploy to AKS"
inputs:
azureSubscription: '...'
scriptLocation: inlineScript
workingDirectory: $(Build.SourcesDirectory)/score
inlineScript: |
set -e # fail on error
az ml model deploy --name 'aks-deploy-test' --model '$(MODEL_NAME):$(get_model.MODEL_VERSION)' \
--compute-target $(AKS_COMPUTE_NAME) \
--ic inference_config.yml \
--dc deployment_config_aks.yml \
-g ... --workspace-name ... \
--overwrite -v
When I run the pipeline the first time, it successfully deployed the ML model and I can see the Endpoint in the Azure ML workspace. However, when I try to run the pipeline a second time (to deploy a newer version of the model), I get the error:
Error:
{
"code": "KubernetesError",
"statusCode": 400,
"message": "Kubernetes Deployment Error",
"details": [
{
"code": "Unschedulable",
"message": "0/6 nodes are available: 4 Insufficient cpu, 6 Insufficient memory."
},
{
"code": "DeploymentFailed",
"message": "Couldn't schedule because the kubernetes cluster didn't have available resources after trying for 00:05:00.\nYou can address this error by either adding more nodes, changing the SKU of your nodes or changing the resource requirements of your service.\nPlease refer to https://aka.ms/debugimage#container-cannot-be-scheduled for more information."
}
]
}
Isn't the --overwrite option in the az ml model deploy command supposed to completely overwrite the current deployment of the model? If so, why am I still getting this error, or is there a better way to deploy a newer version of the ML model to the same AKS cluster?