Hi @Jens Madsen ,
Thanks for using Microsoft Q&A platform.
I understand that you are trying to retrieve the information related to the token limit of a privately deployed OpenAI LLM model, through an API call. I will be happy to assist you with this.
Here is the detailed information related to the API call to query the quota usage in a given region, for a specific subscription.
Manage Azure OpenAI Service quota - Azure AI services | Microsoft Learn
- Query Quota Usage :
Please provide the subscription Id , location in the respective places in below command :
- GET COMMAND :
Here are the detailed path parameters and types expected:
Below is the example request:
- CURL COMMAND :
curl -X GET https://management.azure.com/subscriptions/***00000000-0000-0000-0000-000000000000***/providers/Microsoft.CognitiveServices/locations/eastus/usages?api-version=2023-05-01 -H "Content-Type: application/json" -H 'Authorization: Bearer YOUR_AUTH_TOKEN'
Make sure to replace “subscriptionId”, “location”, "YOUR_AUTH_TOKEN" with the actual subscriptionid, location, access token you obtained through Azure AD or the Azure portal, respectively.
It helps you get started programmatically creating deployments that use quota to set TPM rate limits.
With the introduction of quota you must use API version 2023-05-01 for resource management related activities.
More detailed information about "quota usage" reference document:
Usages - List - REST API (Azure Cognitive Services) | Microsoft Learn
I hope this information helps! Let me know if you have any further questions.