my deployed endpoint began timing out for all requests to the model

Question

my deployed endpoint began timing out for all requests to the model

ns-7252 5

Introduction/Summary of Problem:

I am consistently encountering 408 Request Timeout errors when attempting to call a deployed chat model (gpt-oss-120b) via an Azure AI Studio (Foundry) inference endpoint. This is happening within the Playground experience in the Azure AI Studio portal.

Detailed Scenario:

I have an Azure AI Studio setup in the East US 2 region where I've deployed a chat model, gpt-oss-120b, through the Model Catalog. When interacting with this deployed model using the 'Playground' in the AI Studio portal, I receive a 408 Request Timeout error. This issue is consistent and reproducible.

Observed Outcome / Error Details (Crucial for Microsoft to diagnose):

The browser console shows the following POST request and subsequent 408 Request Timeout. The request is made to an endpoint formatted as https://[my-ai-hub-name].services.ai.azure.com/models/chat/completions?api-version=2024-05-01-preview.

POST https://[REDACTED_HUB_NAME].services.ai.azure.com/models/chat/completions?api-version=2024-05-01-preview 408 (Request Timeout)
XMLHttpRequest.send @ manualChunk_common_core-62bed741.js:1
... [truncated stack trace, include if it fits concisely] ...

Response message from Copilot/UI: Timeout: The operation was timeout. | Apim-request-id: 2cd4bad2-87bf-421b-82f9-e5d9b5dd8c20

Request Body (from browser console):

JSON

Relevant Request Headers (from browser console - REDACT ALL SENSITIVE INFO):

:authority: [REDACTED_HUB_DOMAIN]
:method: POST
:path: /models/chat/completions?api-version=2024-05-01-preview
:scheme: https
api-key: [REDACTED_API_KEY]
content-type: application/json
origin: https://ai.azure.com
referer: https://ai.azure.com/
request-id: [REDACTED_REQUEST_ID]
traceparent: [REDACTED_TRACEPARENT]
user-agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/139.0.0.0 Safari/537.36
x-ms-client-request-id: [REDACTED_CLIENT_REQUEST_ID]
x-ms-useragent: AzureOpenAI.Studio/ai.azure.com

APIM from playground when sending 1 message "test message":
Chat history

test message

Copilot said:

Azure AI Foundry

AI-generated content may be incorrect

Error

Timeout: The operation was timeout. | Apim-request-id: 1ac28c24-fbef-4be2-bcba-48ca94f7275b

New chat session startedThe assistant setup has been updated. Previous messages won't be used as context for new queries.New chat session started

test message again

Copilot said:

Azure AI Foundry

AI-generated content may be incorrect

Error

Timeout: The operation was timeout. | Apim-request-id: e34ecede-a08a-460b-a986-41a905696df1__

Environment Details:__

Location: East US 2

Deployed Model Name: gpt-oss-120b

API Version: 2024-05-01-preview

Approximate Timestamp of a recent failure (include time zone): [Insert specific date and time, e.g., "August 19, 2025, 1:40 PM EDT"]

Troubleshooting Efforts:

Confirmed that the endpoint URL and API key align with what the Azure AI Studio playground itself is using for this deployment.

Attempted to use az ml CLI commands to list related workspaces or endpoints, but these commands did not return expected results for deployments made via the AI Studio project, suggesting a different management approach for 'Foundry Models'.

I've checked for associated Application Insights resources for the AI Studio project's deployments, but I haven't been able to confirm one is connected for this telemetry.

I am aware of recent reports regarding intermittent 408 timeouts for Azure AI Studio/Foundry services in the East US region and am seeking to confirm if this is part of that broader issue or specific to my deployment.

Checked Azure Service Health for East US 2 for any related incidents (none currently reported).
Verified the model deployment status in the AI Studio portal shows 'Healthy'.Introduction/Summary of Problem: "I am consistently encountering 408 Request Timeout errors when attempting to call a deployed chat model (gpt-oss-120b) via an Azure AI Studio (Foundry) inference endpoint. This is happening within the Playground experience in the Azure AI Studio portal." Detailed Scenario: "I have an Azure AI Studio setup in the East US 2 region where I've deployed a chat model, gpt-oss-120b, through the Model Catalog. When interacting with this deployed model using the 'Playground' in the AI Studio portal, I receive a 408 Request Timeout error. This issue is consistent and reproducible." Observed Outcome / Error Details (Crucial for Microsoft to diagnose): "The browser console shows the following POST request and subsequent 408 Request Timeout. The request is made to an endpoint formatted as https://[my-ai-hub-name].services.ai.azure.com/models/chat/completions?api-version=2024-05-01-preview.
```
  POST https://[REDACTED_HUB_NAME].services.ai.azure.com/models/chat/completions?api-version=2024-05-01-preview 408 (Request Timeout)
```

XMLHttpRequest.send @ manualChunk_common_core-62bed741.js:1 ... [truncated stack trace, include if it fits concisely] ...

Response message from Copilot/UI: Timeout: The operation was timeout. | Apim-request-id: 2cd4bad2-87bf-421b-82f9-e5d9b5dd8c20

  
  __Request Body (from browser console):__
  
  JSON
  
  ```json
  {

Relevant Request Headers (from browser console - REDACT ALL SENSITIVE INFO):

  :authority: [REDACTED_HUB_DOMAIN]
:method: POST
:path: /models/chat/completions?api-version=2024-05-01-preview
:scheme: https
api-key: [REDACTED_API_KEY]
content-type: application/json
origin: https://ai.azure.com
referer: https://ai.azure.com/
request-id: [REDACTED_REQUEST_ID]
traceparent: [REDACTED_TRACEPARENT]
user-agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/139.0.0.0 Safari/537.36
x-ms-client-request-id: [REDACTED_CLIENT_REQUEST_ID]
x-ms-useragent: AzureOpenAI.Studio/ai.azure.com

Environment Details:

 __Location:__ `East US 2`
 
    __Deployed Model Name:__ `gpt-oss-120b`
    
       __API Version:__ `2024-05-01-preview`
       
          __Approximate Timestamp of a recent failure (include time zone):__ [Insert specific date and time, e.g., "August 19, 2025, 1:40 PM EDT"]

Troubleshooting Efforts:

Confirmed that the endpoint URL and API key align with what the Azure AI Studio playground itself is using for this deployment.

Attempted to use az ml CLI commands to list related workspaces or endpoints, but these commands did not return expected results for deployments made via the AI Studio project, suggesting a different management approach for 'Foundry Models'.

I've checked for associated Application Insights resources for the AI Studio project's deployments, but I haven't been able to confirm one is connected for this telemetry.

I am aware of recent reports regarding intermittent 408 timeouts for Azure AI Studio/Foundry services in the East US region and am seeking to confirm if this is part of that broader issue or specific to my deployment.

Checked Azure Service Health for East US 2 for any related incidents (none currently reported).

Verified the model deployment status in the AI Studio portal shows 'Healthy'.

I can provide more information privately. I am surprised I had to pay $29/month just to post to a public forum to verify why Azure's own service is not working.... please advise on resolution.

I observed this in my app first, then removed my app from the situation and went straight to playground and same behavior. I have used timeout params as long as 3 minutes and still same issue.

Locked Question. You can vote on whether it's helpful, but you can't add comments or replies or follow the question.

ns-7252 5 Reputation points

2025-08-19T21:06:49.37+00:00

Responses started to come through again after about 24hrs down with virtually no way to resolve myself...

Now I am getting some responses back, but they are taking a very long time in app. Playground seems to be failing intermittently.

Performance is extremely degraded > 24hrs with the majority of that time being down altogether because responses like below times out with no feedback. Below is an improvement that is still effectively not usable. Please advise.

test

Copilot said:

Azure AI Foundry| gpt-oss-120b

AI-generated content may be incorrect

Chain of thought summary

Hello! How can I assist you today?

hmmmm are you really working again

Copilot said:

Azure AI Foundry| gpt-oss-120b

AI-generated content may be incorrect

Chain of thought summary

Yes, I’m here and ready to help! What can I assist you with today?

how about now?

Copilot said:

Azure AI Foundry

AI-generated content may be incorrect

Error

Timeout: The operation was timeout. | Apim-request-id: 62108245-9976-4392-92d4-dff952282c70
Ertuğrul Cöre 0 Reputation points

2025-08-21T17:22:15.97+00:00

I'm having the same issue with gpt-oss-120b models deployed on west germany.

1 answer

ns-7252 5 Reputation points

2025-08-19T21:06:49.37+00:00

Responses started to come through again after about 24hrs down with virtually no way to resolve myself...

Now I am getting some responses back, but they are taking a very long time in app. Playground seems to be failing intermittently.

Performance is extremely degraded > 24hrs with the majority of that time being down altogether because responses like below times out with no feedback. Below is an improvement that is still effectively not usable. Please advise.

test

Copilot said:

Azure AI Foundry| gpt-oss-120b

AI-generated content may be incorrect

Chain of thought summary

Hello! How can I assist you today?

hmmmm are you really working again

Copilot said:

Azure AI Foundry| gpt-oss-120b

AI-generated content may be incorrect

Chain of thought summary

Yes, I’m here and ready to help! What can I assist you with today?

how about now?

Copilot said:

Azure AI Foundry

AI-generated content may be incorrect

Error

Timeout: The operation was timeout. | Apim-request-id: 62108245-9976-4392-92d4-dff952282c70
Ertuğrul Cöre 0 Reputation points

2025-08-21T17:22:15.97+00:00

I'm having the same issue with gpt-oss-120b models deployed on west germany.

Answer 1

ns-7252 5

Nevermind. Azure seems to have correcte the issue on their infrastructure behind the scenes. I want to cancel this supoprt plan as i did no receive a response in time and also the issue was never on my end. there should NOT be a pay barrier to communicate with Azure support about their own service degradations and problems that are scoped entirely to their infrastructure. How do I get a full refund and cancel renewal?

Andrew Burgess 30 Reputation points

2025-08-21T14:49:16.6566667+00:00

FYI I am having exactly the same issue, but in Sweden Central, and it is still not working.
Manas Mohanty 17,180 Reputation points Microsoft External Staff Moderator

2025-08-22T04:41:14.4666667+00:00
Hi ns-7252

Sorry for inconvenience. There was some issue on backend side which delayed prioritization of the case.

It seems you are facing operation time out on GPT-oss-120b model deployment in East US 2 with below error

Timeout: The operation was timeout. | Apim-request-id: e34ecede-a08a-460b-a986-41a905696df1

Suggestion -

Just tested the model in Sweden Central regions, it is very fast in some regions

Increase the TPM to avail higher request per second.

Ask

Is your resource protected with virtual network

Are you using any proxy service
Manas Mohanty 17,180 Reputation points Microsoft External Staff Moderator

2025-08-25T01:50:04.8033333+00:00

Hi ns-7252

Could you let us know if the above pointer were helpful to you.Thank you
Andrew Burgess 30 Reputation points

2025-08-26T06:52:06.7133333+00:00

FYI seems to be working now. Would have been nice to have seen some sort of status notice to save all of this hassle.
Ertuğrul Cöre 0 Reputation points

2025-08-26T14:11:36.62+00:00

If uptime will be like this, then we are better off hosting these models locally to be honest. Microsoft has to send downtime notifications and estimated resolve times.
Manas Mohanty 17,180 Reputation points Microsoft External Staff Moderator

2025-08-27T00:58:48.0966667+00:00

Hi Andrew Burgess and Ertuğrul Cöre

Sorry for the trouble. We create custom log search alert (latency issue) and metrics alert (408, 500, 429) and trigger an email or fail over to another endpoint through action group as Sometime Service health event does not report this intermittent transient issues.

https://learn.microsoft.com/en-us/azure/azure-monitor/alerts/alerts-create-metric-alert-rule

https://learn.microsoft.com/en-us/azure/azure-monitor/alerts/alerts-create-log-alert-rule

Thank you
Andrew Burgess 30 Reputation points

2025-08-27T06:59:09.3933333+00:00

Thank you. Is this alert applied to the hub, the project or the specific model?
Manas Mohanty 17,180 Reputation points Microsoft External Staff Moderator

2025-08-27T08:19:50.27+00:00

Andrew Burgess

You can set it on model level with additional filter.

Thank you
Wolfgang Sperger 0 Reputation points

2025-09-05T08:36:53.93+00:00

we are getting the same issue here, this is outrageous

Share via

my deployed endpoint began timing out for all requests to the model

1 answer