Hi there Gaurav Wagh
Thanks for using QandA plaftorm
The first request delay in Azure OpenAI models is likely due to cold start latency. When a model isn't actively in use, it takes time to load resources and initialize before processing the request, leading to a 14-15 second delay. Subsequent requests are faster because the model remains warm.
To reduce this, try keeping the model active by sending periodic lightweight requests or using Azure Reserved Capacity for consistent performance. alsocheck region availability, network latency, and service quotas.
If this helps kindly accept the answer thanks much.