How can we increase our Azure OpenAI service quota so that we can have multiple LLM endpoints and deploy multiple models that we can run multiple models in parallel

avi yashchin 0 Reputation points
2024-09-06T15:43:35.3733333+00:00

We currently have 10mm TPM on OpenAI. We're trying to get similar throughput on Azure but only have 200k TPM. This is a meaningful difference that effectively makes Azure OpenAI unusable for our purposes.

We have parallelized our solution, but Azure only allows us to have a single OpenAI Endpoint. If we could create 50 OpenAI Endpoints on Azure, we can achieve a similar TPM as we get on OpenAI, however Azure only allows us to have a single LLM endpoint.

My question is - how can we increase our Azure OpenAI service quota so that we can have multiple LLM endpoints and deploy multiple models that we can run multiple models in parallel?

Azure OpenAI Service
Azure OpenAI Service
An Azure service that provides access to OpenAI’s GPT-3 models with enterprise capabilities.
4,092 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. Laziz 190 Reputation points Microsoft Employee
    2024-09-09T15:45:42.9233333+00:00

    Hi avi yashchin, you may consider 2 options here:

    • Use Global-Standard deployment, as it provides higher TPM limits for GPT-4, GPT-4o and GPT-4o-mini: https://learn.microsoft.com/en-us/azure/ai-services/openai/quotas-limits#gpt-4o--gpt-4-turbo-global-standard
    • Or, if you want to increase quota for a Standard deployment, you can open Azure OpenAI Studio, click Quota, choose your Az subscription and then in the right corner click "Request quota" button (as shown on the attached screenshot). You will be forwarded to online form, where you can specify new quota requirements and if approved, relevant quota increase will be allocated.

    User's image

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.