Share via

Deployed Huggingface model using managed compute, cannot increase max_concurrent_requests_per_instance. Always stays 0

Tony Mellios 0 Reputation points
2025-03-24T19:20:35.0233333+00:00

Hello,

I have used Azure AI foundry to deploy this model in the West US region

azureml://registries/HuggingFace/models/sentence-transformers-all-minilm-l6-v2/versions/3

I'm using and A100 instance.

Standard_NC24ads_A100_v4

The deployment is currently configured with a max concurrency of "0". I'm not really sure what this value means but we are observing that we're not getting 429's so I was trying to increase it.

{
  "app_insights_enabled": false,
  "creation_context": {
    "created_at": "2025-03-23T17:10:57.058258+00:00",
    "created_by": "Blah"
    "last_modified_at": "2025-03-23T17:10:57.058259+00:00"
  },
  "egress_public_network_access": "enabled",
  "endpoint_name": "ai-blah",
  "environment_variables": {},
  "id": "/subscriptions/BLAH/resourceGroups/rg-blah/providers/Microsoft.MachineLearningServices/workspaces/proj-blah/onlineEndpoints/ai-blah/deployments/sentence-transformers-all-min-3",
  "instance_count": 1,
  "instance_type": "Standard_NC24ads_A100_v4",
  "model": "azureml://registries/HuggingFace/models/sentence-transformers-all-minilm-l6-v2/versions/3",
  "name": "sentence-transformers-all-min-3",
  "properties": {
    "AzureAsyncOperationUri": "blah"
  },
  "provisioning_state": "Succeeded",
  "request_settings": {
    "max_concurrent_requests_per_instance": 0
  },
  "resourceGroup": "rg-blah",
  "scale_settings": {
    "type": "default"
  },
  "tags": {},
  "type": "managed"
}

I have tried amending this value to a higher one using the below command. The command returns but the value remains 0. I've tried multiple different values to no avail.

az ml online-deployment update --name sentence-transformers-all-min-3 --endpoint-name ai-hub-blah --resource-group rg-blah --workspace-name proj-blah --set request_settings.max_concurrent_requests_per_instance=4

Any help would be greatly appreciated.

Azure Machine Learning

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.