homin wang Greetings & Welcome to Microsoft Q&A forum!
There are no known issues reported at the moment, there are a few things you can do to troubleshoot and mitigate the issue.
You can check if there are any changes in the usage of your Azure resources that might be causing the slowdown. You can use the Azure portal to monitor your resource usage and costs. You can also use the Azure Monitor to collect and analyze metrics and logs for your Azure resources.
This can help you identify any spikes in resource usage or other issues that might be causing the slowdown.
Please check the latency metrics and check which API operation is consuming more time.
You can open the Azure OpenAI resource from your portal and navigate to the metrics section and apply the splitting for the latency metrics and check which API / operationName was time consuming?
I would appreciate any suggestions or solutions that could help in resolving or mitigating this issue.
I would suggest you, check the documentation to Improve performance.
Here are some of the best practices to lower latency:
- Model latency: If model latency is important to you we recommend trying out our latest models in the GPT-3.5 Turbo model series.
- Lower max tokens: OpenAI has found that even in cases where the total number of tokens generated is similar the request with the higher value set for the max token parameter will have more latency.
- Lower total tokens generated: The fewer tokens generated the faster the overall response will be. Remember this is like having a for loop with
n tokens = n iterations. Lower the number of tokens generated and overall response time will improve accordingly.
- Streaming: Enabling streaming can be useful in managing user expectations in certain situations by allowing the user to see the model response as it is being generated rather than having to wait until the last token is ready.
- Content Filtering improves safety, but it also impacts latency. Evaluate if any of your workloads would benefit from modified content filtering policies.
Do let me know if that helps or have any other queries.
If the response helped, please do click Accept Answer and Yes for was this answer helpful.
Doing so would help other community members with similar issue identify the solution. I highly appreciate your contribution to the community.