Responses API with gpt-4.1 running extemely slow in Sweden Central

Question

Responses API with gpt-4.1 running extemely slow in Sweden Central

Ola Ingvarsson 25

We are experiencing extreme lag in streaming responses from the Responses API in Sweden Central.

I just wanted to flag this and see if anyone else is experiencing the same issue.

Tobiasz Heller 0 Reputation points

2025-05-14T10:38:44.5166667+00:00

We are observing the same issue with high latency using Responses API in Sweden Central on gpt-4o.

Sometimes it works fine but in most cases responses are very slow, up to 1 minute.

It was working fine couple days ago.
SriLakshmi C 6,010 Reputation points Microsoft External Staff Moderator

2025-05-16T06:38:02.1433333+00:00

Hi @Ola Ingvarsson,

Did you get any chance to check the above response. Thank you!
Ola Ingvarsson 25 Reputation points

2025-05-16T06:45:28.2533333+00:00

Yes. But the degradation in performance was sudden and had nothing to do with our prompts and wasn't isolated to our deployment,

Right now gpt-4.1 is performing normally. No delays at the moment.
SriLakshmi C 6,010 Reputation points Microsoft External Staff Moderator

2025-05-16T09:50:56.6866667+00:00

Hi Tobiasz Heller,

I attempted to reproduce the issue using GPT-4o in the same region. However, during my testing, I did not encounter any latency or delay. It’s possible that the slowness is intermittent and may be related to temporary spikes in demand or regional load on the service.

To help us troubleshoot further, could you please:

Share a screenshot of the relevant metrics (latency, response times, etc.) from your monitoring setup?

Also, please also check the Azure Status page to see if there were any reported delays or service interruptions in the Sweden Central region around the time the latency occurred.

Thank you!
SriLakshmi C 6,010 Reputation points Microsoft External Staff Moderator

2025-05-16T10:01:04.1766667+00:00

Hi Ola Ingvarsson,

Thank you for the update.

We understand that the performance degradation you experienced with GPT-4.1 was sudden, not related to your prompts, and affected multiple deployments. It's great to hear that the service is now performing normally with no delays.

Please continue to monitor the performance, and if the issue resurfaces, we recommend capturing key metrics such as response time and timestamps to help with further investigation.

I’m converting my previous response into the final answer, so could you please consider it as accepted?

Thank you!
SriLakshmi C 6,010 Reputation points Microsoft External Staff Moderator

2025-05-19T05:22:36.5833333+00:00

Hi @Ola Ingvarsson,

We haven’t heard from you on the last response and was just checking back to see if you have a resolution yet. In case if you have any resolution please do share that same with the community as it can be helpful to others. Otherwise, will respond with more details and we will try to help.

Thank you!
Ola Ingvarsson 25 Reputation points

2025-05-19T11:57:53.34+00:00

GPT-4.1 is performing normally now. We haven't made any changes on our end so the issue seems to have been in Azure.
SriLakshmi C 6,010 Reputation points Microsoft External Staff Moderator

2025-05-19T12:20:31.2733333+00:00

Hi @Ola Ingvarsson,

Thank you for the update I’m glad to hear that GPT-4.1 is performing normally again.

It’s possible there was a temporary service disruption or latency issue on the Azure backend that has since been resolved. We’ve seen such cases occasionally where no configuration changes are needed on the customer side.

Thank you!

1 answer

Your answer

Tobiasz Heller 0 Reputation points

2025-05-14T10:38:44.5166667+00:00

We are observing the same issue with high latency using Responses API in Sweden Central on gpt-4o.

Sometimes it works fine but in most cases responses are very slow, up to 1 minute.

It was working fine couple days ago.
SriLakshmi C 6,010 Reputation points Microsoft External Staff Moderator

2025-05-16T06:38:02.1433333+00:00

Hi @Ola Ingvarsson,

Did you get any chance to check the above response. Thank you!
Ola Ingvarsson 25 Reputation points

2025-05-16T06:45:28.2533333+00:00

Yes. But the degradation in performance was sudden and had nothing to do with our prompts and wasn't isolated to our deployment,

Right now gpt-4.1 is performing normally. No delays at the moment.
SriLakshmi C 6,010 Reputation points Microsoft External Staff Moderator

2025-05-16T09:50:56.6866667+00:00

Hi Tobiasz Heller,

I attempted to reproduce the issue using GPT-4o in the same region. However, during my testing, I did not encounter any latency or delay. It’s possible that the slowness is intermittent and may be related to temporary spikes in demand or regional load on the service.

To help us troubleshoot further, could you please:

Share a screenshot of the relevant metrics (latency, response times, etc.) from your monitoring setup?

Also, please also check the Azure Status page to see if there were any reported delays or service interruptions in the Sweden Central region around the time the latency occurred.

Thank you!
SriLakshmi C 6,010 Reputation points Microsoft External Staff Moderator

2025-05-16T10:01:04.1766667+00:00

Hi Ola Ingvarsson,

Thank you for the update.

We understand that the performance degradation you experienced with GPT-4.1 was sudden, not related to your prompts, and affected multiple deployments. It's great to hear that the service is now performing normally with no delays.

Please continue to monitor the performance, and if the issue resurfaces, we recommend capturing key metrics such as response time and timestamps to help with further investigation.

I’m converting my previous response into the final answer, so could you please consider it as accepted?

Thank you!
SriLakshmi C 6,010 Reputation points Microsoft External Staff Moderator

2025-05-19T05:22:36.5833333+00:00

Hi @Ola Ingvarsson,

We haven’t heard from you on the last response and was just checking back to see if you have a resolution yet. In case if you have any resolution please do share that same with the community as it can be helpful to others. Otherwise, will respond with more details and we will try to help.

Thank you!
Ola Ingvarsson 25 Reputation points

2025-05-19T11:57:53.34+00:00

GPT-4.1 is performing normally now. We haven't made any changes on our end so the issue seems to have been in Azure.
SriLakshmi C 6,010 Reputation points Microsoft External Staff Moderator

2025-05-19T12:20:31.2733333+00:00

Hi @Ola Ingvarsson,

Thank you for the update I’m glad to hear that GPT-4.1 is performing normally again.

It’s possible there was a temporary service disruption or latency issue on the Azure backend that has since been resolved. We’ve seen such cases occasionally where no configuration changes are needed on the customer side.

Thank you!

Answer 1

Hello @Ola Ingvarsson,

It sounds like you're encountering unusually high latency when using the Responses API with the GPT-4.1 model in Sweden Central.

I attempted to reproduce the issue in my environment using the GPT-4.1 model in the Sweden Central region, and it's working as expected without any noticeable latency or delays.

This kind of performance degradation can be influenced by several factors,

Ensure you're operating within the allowed Requests Per Minute (RPM) and Tokens Per Minute (TPM) quotas for GPT-4.1. For the default tier, GPT-4.1 supports up to 1,000 RPM and 1 million TPM. Exceeding these quotas can lead to throttling or delays.

If you're frequently making similar requests, implementing caching can reduce repeated calls to the service and improve response times. You can control caching behavior using the Cache-Control header. For example:

Cache-Control: max-age=30

This sets the cache validity to 30 seconds. Use directives like no-cache or no-store to bypass or disable caching as needed.

Large payloads especially prompt with high token counts can significantly impact response time. Try reducing the input size or limiting max_tokens in your request.

If latency remains high, consider deploying your model to another region temporarily (e.g., West Europe or North Europe) to compare performance. This helps determine whether the issue is regional or related to your specific deployment.

Leverage tools like Azure Monitor or Application Insights to analyze request latency, identify spikes, and establish a performance baseline. This can help you determine if the issue is systemic or workload specific.

Start by checking the Azure Status Page to verify if there are any known outages or performance issues affecting the Sweden Central region. Latency can often be caused by regional service disruptions or maintenance.

I hope this helps, do let me know if you have further queries.

Thank you!

Share via

Responses API with gpt-4.1 running extemely slow in Sweden Central

1 answer

Your answer