Responses API with gpt-4.1 running extemely slow in Sweden Central

Ola Ingvarsson 25 Reputation points
2025-05-14T08:36:55.1333333+00:00

We are experiencing extreme lag in streaming responses from the Responses API in Sweden Central.

I just wanted to flag this and see if anyone else is experiencing the same issue.

Azure OpenAI Service
Azure OpenAI Service
An Azure service that provides access to OpenAI’s GPT-3 models with enterprise capabilities.
4,083 questions
{count} votes

1 answer

Sort by: Most helpful
  1. SriLakshmi C 6,010 Reputation points Microsoft External Staff Moderator
    2025-05-14T10:12:47.53+00:00

    Hello @Ola Ingvarsson,

    It sounds like you're encountering unusually high latency when using the Responses API with the GPT-4.1 model in Sweden Central.

    I attempted to reproduce the issue in my environment using the GPT-4.1 model in the Sweden Central region, and it's working as expected without any noticeable latency or delays.

    This kind of performance degradation can be influenced by several factors,

    Ensure you're operating within the allowed Requests Per Minute (RPM) and Tokens Per Minute (TPM) quotas for GPT-4.1. For the default tier, GPT-4.1 supports up to 1,000 RPM and 1 million TPM. Exceeding these quotas can lead to throttling or delays.

    If you're frequently making similar requests, implementing caching can reduce repeated calls to the service and improve response times. You can control caching behavior using the Cache-Control header. For example:

    Cache-Control: max-age=30
    

    This sets the cache validity to 30 seconds. Use directives like no-cache or no-store to bypass or disable caching as needed.

    Large payloads especially prompt with high token counts can significantly impact response time. Try reducing the input size or limiting max_tokens in your request.

    If latency remains high, consider deploying your model to another region temporarily (e.g., West Europe or North Europe) to compare performance. This helps determine whether the issue is regional or related to your specific deployment.

    Leverage tools like Azure Monitor or Application Insights to analyze request latency, identify spikes, and establish a performance baseline. This can help you determine if the issue is systemic or workload specific.

    Start by checking the Azure Status Page to verify if there are any known outages or performance issues affecting the Sweden Central region. Latency can often be caused by regional service disruptions or maintenance.

    I hope this helps, do let me know if you have further queries.

    Thank you!

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.