ADM-Akash.Yadav Greetings & Welcome to Microsoft Q&A forum!
I need to know some information about OpenAI service scaling. I need to scale my OpenAI service to accommodate more users because there are multiple users who have access to it and there are access token limitations.
Can you add more details? Are you looking for suggestions to increase the quota?
Scaling the Azure OpenAI service for multiple users depends on the level of isolation and performance that you want to achieve. I would suggest you check Multitenancy and Azure OpenAI Service for more details.
You can scale your OpenAI service to accommodate more users, you can apply for a quota increase. See Azure OpenAI Service quotas and limits for more details.
Quota increase requests can be submitted from the Quotas page of Azure OpenAI Studio. Please note that due to overwhelming demand, quota increase requests are being accepted and will be filled in the order they are received. Priority will be given to customers who generate traffic that consumes the existing quota allocation, and your request may be denied if this condition is not met.
For other rate limits, please submit a service request.
To minimize issues related to rate limits, it's a good idea to use the following techniques:
- Implement retry logic in your application.
- Avoid sharp changes in the workload. Increase the workload gradually.
- Test different load increase patterns.
- Increase the quota assigned to your deployment. Move quota from another deployment, if necessary.
Also, see Manage Azure OpenAI Service quota for more details.
Do let me know if that helps or have any other queries.
If the response helped, please do click Accept Answer
and Yes
for was this answer helpful.
Doing so would help other community members with similar issue identify the solution. I highly appreciate your contribution to the community.