Azure App Service is a service used to create and deploy scalable, mission-critical web apps.
GS hello,
If calls are suddenly taking 4+ minutes in East US 2 and this was not happening before, this is very unlikely to be an application issue. When latency jumps that dramatically across all calls, it is usually regional capacity pressure, backend incident or network path degradation.
U can check Azure Service Health and the Azure AI Services status for East US 2 to see if there is a live incident or advisory. High latency without errors often means the region is under heavy load and requests are being queued.
Lets test the same model in another region if u have a deployment elsewhere. If latency drops immediately in another region, it confirms a regional capacity issue.
See whether u are using streaming or non streaming responses. Non streaming large outputs can appear blocked until completion, especially if max_tokens is high. If it reproduces in Foundry playground and not just ur code, that strongly points to a regional backend issue. In that case, capture request_id and timestamp and open a support ticket. really 4 minute response times are not normal steady state behaviour for that model in that region.
rgds,
alex