@Darshan Gupta Welcome to Microsoft Q&A Forum, Thank you for posting your query here!
.
There are multiple factors that could contribute to the latency.
- Network Latency: The network latency in the production environment could be higher than your local environment. This could be due to the physical distance between the server and the Azure region, or due to network congestion. Please check where your production application is hosted. Can you keep it in same region as that of the Speech resource ?
- Concurrency: The Azure AI Speech service has the ability to autoscale, but it takes time to scale out. If the concurrency is increased in a short time, the client may experience longer latency or even receive a 429 error code (too many requests). So, we recommend you increase your concurrency step by step in load test. See this article for more details, especially this example of workload patterns.
- Recommendations: The recommendations to lower latency is mentioned in this article. Please follow this.
.
On a side note:
You can also check the latency metrics and identify which operation is taking time and compare it with your prod and local dev environment.
Hope this helps. If you have any follow-up questions, please let me know. I would be happy to help.