How can we increase our Azure OpenAI service quota so that we can have multiple LLM endpoints and deploy multiple models that we can run multiple models in parallel

Question

How can we increase our Azure OpenAI service quota so that we can have multiple LLM endpoints and deploy multiple models that we can run multiple models in parallel

avi yashchin 0

We currently have 10mm TPM on OpenAI. We're trying to get similar throughput on Azure but only have 200k TPM. This is a meaningful difference that effectively makes Azure OpenAI unusable for our purposes.

We have parallelized our solution, but Azure only allows us to have a single OpenAI Endpoint. If we could create 50 OpenAI Endpoints on Azure, we can achieve a similar TPM as we get on OpenAI, however Azure only allows us to have a single LLM endpoint.

My question is - how can we increase our Azure OpenAI service quota so that we can have multiple LLM endpoints and deploy multiple models that we can run multiple models in parallel?

1 answer

Your answer

Answer 1

Hi avi yashchin, you may consider 2 options here:

Use Global-Standard deployment, as it provides higher TPM limits for GPT-4, GPT-4o and GPT-4o-mini: https://learn.microsoft.com/en-us/azure/ai-services/openai/quotas-limits#gpt-4o--gpt-4-turbo-global-standard
Or, if you want to increase quota for a Standard deployment, you can open Azure OpenAI Studio, click Quota, choose your Az subscription and then in the right corner click "Request quota" button (as shown on the attached screenshot). You will be forwarded to online form, where you can specify new quota requirements and if approved, relevant quota increase will be allocated.

User's image

Share via

How can we increase our Azure OpenAI service quota so that we can have multiple LLM endpoints and deploy multiple models that we can run multiple models in parallel

1 answer

Your answer