Hello @Roni Mateless,
I’ve attempted to reproduce the slowness with GPT-4.1 in the East US2 region in my own environment, but everything appears to be functioning normally without any noticeable latency.
However, I understand you’re experiencing delays, and this can sometimes occur due to factors such as high regional demand or usage limits. Here are a few suggestions to help troubleshoot and improve response times:
Sometimes, Azure may experience outages or high usage in a specific region. You can check the Azure status page for any ongoing issues.
Ensure that you're operating within your allocated service quotas. The usage limits for the GPT-4.1 model vary based on your subscription tier:
- Enterprise Tier: Up to 5 million tokens per minute and 5,000 requests per minute
- Default Tier: Up to 1 million tokens per minute and 1,000 requests per minute
GPT-4.1 supports a large context window, which can affect response times. If you're experiencing slowness, try reducing the size of your input prompts to see if it improves performance.
I Hope this helps. Do let me know if you have any further queries.
Thank you!