An Azure machine learning service for building and deploying models.
Deployed Huggingface model using managed compute, cannot increase max_concurrent_requests_per_instance. Always stays 0
Tony Mellios
0
Reputation points
Hello,
I have used Azure AI foundry to deploy this model in the West US region
azureml://registries/HuggingFace/models/sentence-transformers-all-minilm-l6-v2/versions/3
I'm using and A100 instance.
Standard_NC24ads_A100_v4
The deployment is currently configured with a max concurrency of "0". I'm not really sure what this value means but we are observing that we're not getting 429's so I was trying to increase it.
{
"app_insights_enabled": false,
"creation_context": {
"created_at": "2025-03-23T17:10:57.058258+00:00",
"created_by": "Blah"
"last_modified_at": "2025-03-23T17:10:57.058259+00:00"
},
"egress_public_network_access": "enabled",
"endpoint_name": "ai-blah",
"environment_variables": {},
"id": "/subscriptions/BLAH/resourceGroups/rg-blah/providers/Microsoft.MachineLearningServices/workspaces/proj-blah/onlineEndpoints/ai-blah/deployments/sentence-transformers-all-min-3",
"instance_count": 1,
"instance_type": "Standard_NC24ads_A100_v4",
"model": "azureml://registries/HuggingFace/models/sentence-transformers-all-minilm-l6-v2/versions/3",
"name": "sentence-transformers-all-min-3",
"properties": {
"AzureAsyncOperationUri": "blah"
},
"provisioning_state": "Succeeded",
"request_settings": {
"max_concurrent_requests_per_instance": 0
},
"resourceGroup": "rg-blah",
"scale_settings": {
"type": "default"
},
"tags": {},
"type": "managed"
}
I have tried amending this value to a higher one using the below command. The command returns but the value remains 0. I've tried multiple different values to no avail.
az ml online-deployment update --name sentence-transformers-all-min-3 --endpoint-name ai-hub-blah --resource-group rg-blah --workspace-name proj-blah --set request_settings.max_concurrent_requests_per_instance=4
Any help would be greatly appreciated.
Azure Machine Learning
Azure Machine Learning
Sign in to answer