Lukáš Sitta Greetings!
rate_limit_exceeded: Rate limit is exceeded. Try again in 30 seconds. RunId: run_0HK5UP3zdrnH0hWszGkCcb53:
The error message is related to rate limits, which is a common practice in APIs to prevent abuse and ensure fair usage.
In your case, the error message indicates that you’ve exceeded the token rate limit of your current AI Services S0 pricing tier.
Did you check if you have exceeded the quota limit for your Azure OpenAI resources? You can view your quotas and limits in Azure AI studio Model Quota section.
Please see Manage and increase quotas for resources with Azure AI Studio for more details.
You could also try increasing the limit on your deployment.
Also, see Autoscale Azure AI limits and let me know if that helps in your scenario.
To minimize issues related to rate limits, it's a good idea to use the following techniques:
- Set max_tokens and best_of to the minimum values that serve the needs of your scenario. For example, don’t set a large max-tokens value if you expect your responses to be small.
- Use quota management to increase TPM on deployments with high traffic, and to reduce TPM on deployments with limited needs.
- Implement retry logic in your application.
- Avoid sharp changes in the workload. Increase the workload gradually.
- Test different load increase patterns.
Incase if you have already extended the limits and still seeing the issue, please contact support for further assistance.
Hope this helps. Do let me know if you have any further queries.