Response Time from AustraliaEast Azure Open AI seems to take an unusually longer time

Question

Response Time from AustraliaEast Azure Open AI seems to take an unusually longer time

Amey Sunu 0

Hi there, I'm having an unusual timeout from our HttpClient making requests to the gpt-5.1-chat model deployment in Australia East and was wondering if there is any ongoing issue with the response time from this deployment in particular. I lifted the timeout on our HttpClient to see what the expected response time was, and it took about 8 minutes to come back with a response. The other model deployment seems to be okay, and the Australian one keeps happening in a more often but an intermittent pattern.

SRILAKSHMI C 17,865 Reputation points Microsoft External Staff Moderator

2026-02-19T13:44:43.0966667+00:00

Hi Amey Sunu,

Did you get any chance to review the above response. Do let me know if you have any further queries.

Thank you!

2 answers

Your answer

SRILAKSHMI C 17,865 Reputation points Microsoft External Staff Moderator

2026-02-19T13:44:43.0966667+00:00

Hi Amey Sunu,

Did you get any chance to review the above response. Do let me know if you have any further queries.

Thank you!

Answer 1

Hello Amey Sunu,

Welcome to Microsoft Q&A and Thank you for the detailed information.

An 8-minute response time from your gpt-5.1-chat deployment in Australia East is not typical under normal operating conditions, especially since your other model deployments are responding normally. Based on what you’ve described, this appears to be intermittent and region-specific, which helps narrow down the possible causes.

Regarding network latency: the normal round-trip latency to Australia East is typically very low (single-digit milliseconds from nearby regions).

An 8-minute delay would not be caused by standard network latency alone. Since your request eventually completes successfully after increasing the HttpClient timeout, this suggests the request is being accepted and processed, rather than failing due to connectivity issues. That points more toward backend processing delay rather than a pure networking problem.

There are several factors that can influence response time:

Model type and workload characteristics

Prompt size (input token count)

max_tokens or total output tokens requested

Whether streaming is enabled or disabled

Overall system load or regional capacity pressure

Large prompts or high max_tokens settings can significantly increase generation time. If streaming is disabled and the model must generate a large completion before returning anything, the perceived latency can be much higher.

Since the issue is intermittent and specific to Australia East, this may indicate temporary regional capacity pressure or soft throttling. Unlike hard throttling (which returns HTTP 429), soft throttling can queue requests, resulting in long response times rather than immediate rejection. The fact that other regions or deployments behave normally further suggests this could be localized load behavior.

To further diagnose, I recommend:

Enable detailed logging for:

Total request duration
Input/output token usage
Correlation ID (x-request-id)
Timestamp in UTC

Compare:

The same request payload sent to another region
The same deployment under reduced prompt size or lower max_tokens

Test with streaming enabled to see whether tokens begin returning quickly but full completion takes longer. If streaming starts quickly, generation time is the main contributor.

Monitor Azure metrics for your Azure OpenAI resource:

Server latency
Throttled requests
Requests per minute
Retry counts

It would also be helpful to monitor response times over several days to see if there’s a pattern related to time-of-day spikes or usage peaks. That can help determine whether this is capacity-related behavior during high-demand windows.

If this continues, you may want to consider production mitigation strategies such as:

Deploying a secondary region and implementing failover

Reducing max_tokens

Enabling streaming responses

Implementing retry logic with exponential backoff

Please refer this

I hope this helps, do let me know if you have any further queries.

Thank you!

Answer 2

hi,

8 mins for a chat model response is definitely not something I would consider normal behaviour even with fairly large prompts and generous max token settings and before assuming that there is a regional outage in australia east I would start by checking the azure status page together with the resource health blade of your azure openai resource in that region because sometimes there are capacity constraints or partial service degradations that do not immediately show up as full incidents but can still affect latency in a noticeable way especially under load.

I would carefully compare the exact request payload between the working deployment and the slow one including prompt size max tokens temperature and any system messages because if the total token generation is significantly higher in the australia east deployment the model will naturally take longer to respond particularly if you are not using streaming and instead waiting for the full completion which can make the delay look like a timeout when in reality it is just long generation time.

Checking is whether you are hitting rate limits or being silently throttled which may not always surface as a clear 429 error in your client logs but can still introduce queueing delay inside the service and you can verify this by reviewing metrics such as request latency tokens per minute and throttled requests in the azure portal under your openai resource.

If the configuration is identical across regions and only australia east consistently shows intermittent 8 minute responses while other regions return quickly then I would strongly suggest opening a support ticket and including correlation ids from several slow requests so microsoft can trace backend processing time on that specific deployment because behaviour like this usually indicates heavy token generation internal queueing or regional capacity pressure rather than a simple http client timeout issue.

rgds,

Alex

Share via

Response Time from AustraliaEast Azure Open AI seems to take an unusually longer time

2 answers

Your answer