Error while scaling up the deployment
I have deployed a model in azure ML using the automated author, and now when I try to scale it up from 1 instance to more than 1 it fails every time. Both the auto scale and the manual scaling keeps failing.
I also noticed some changes that occurred automatically in the properties like the environment variables and the scale settings.
I am attaching below a zip file where I am attaching everything including the error message that I am encountering and the properties for when I was able to scale the deployment successfully and the recent properties file when I am not able to scale my deployment and both the auto scale and the manual scale fails.
You will notice that in my latest properties file that I have attached the Max instance count is set to 1 but I believe this is just change analysis grouping together nearby changes to the deployment, of which changes by the autoscaler are included.
Our autoscale policy was configured to scale to 2 instances between 5:30AM - 8:00 ET, and earlier in change analysis before the issue began, I could see it automatically changing instanceCount, maxInstances, and minInstances between 1 and 2 each day.
After the issue started, we updated our autoscale policy to instead always try scaling to 2 which includes max instances = 2:
"capacity": {
"minimum": "2",
"maximum": "2",
"default": "2"
},
It is continually trying to scale with these settings but is still failing:
Manual scale to 2 instances also still fails. I've attached the autoscale policy below. "Profile 1" used to be 1 instance, but we changed it to 2 after the issue started to continuously try to scale to 2 instances. I don't think the scaleSettings items on the deployment JSON will update to 2 if the scale is immediately failing.