Azure OpenAI Availability rate down to 65%. 503 error

Question

Azure OpenAI Availability rate down to 65%. 503 error

Tung Nguyen Xuan 70

Today I frequently got service denial for chat completion requests with high token counts (~10K)

openai.InternalServerError: Error code: 503 - {'error': {'code': 'InternalServerError', 'message': 'The service is temporarily unable to process your request. Please try again later.'}}

Included is the availability chart from monitoring

User's image

Deployment info

gpt 4o

Deployment typeGlobal Standard

Rate limit (Tokens per minute)13,565,000

Rate limit (Requests per minute)81,390

Model version2024-11-20

Region eastus

Troubleshooting in the Portal did not help.

I need to explain to my customers that the latency and unavailability is caused by AzureOpenAI, not my production code.

I need pressing support right now.

Tung Nguyen Xuan 70 Reputation points

2025-01-20T01:55:02.09+00:00

It's been 3 days, no response?
Saideep Anchuri 9,500 Reputation points Moderator

2025-01-20T04:31:29.59+00:00
Hi Tung Nguyen Xuan

Welcome to Microsoft Q&A Forum, thank you for posting your query here!

Sorry for the late response. There was no outage reported on OpenAI services in weekend. Probably, your server is going unhealthy because of long size of input queries (10k token queries). It has been observed for consistent longer queries to server brings down server health. We would strongly suggest following below solution to keep server healthy and get faster response rate.

Reduce longer queries to small clear and concise queries so that model builds context slowly and provides answer faster.

You can also reduce size of max-tokens from deployment configuration to constrain the size of generated output.

Create a new deployment in another region and follow previous steps to keep server healthy and consistent. Please reach support ticket if you want detailed RCA on this issue.

Thank You.
Tung Nguyen Xuan 70 Reputation points

2025-01-20T06:09:25.0666667+00:00

@Saideep Anchuri The issue I mentioned was not in weekend, but last Friday. Today the issue seems gone. I have never had this issue before, and I had been sending high token count requests for a long time. I strongly believe this happens due to the gpt4o version, because when I revert to 2024-08-06 the issue was gone.
Tung Nguyen Xuan 70 Reputation points

2025-01-20T06:13:24.72+00:00

@Saideep Anchuri I apologize if this comes across as cynical but in the past there were occasions when 503 errors were reported by users but the monitoring dashboard didn't reflect
Saideep Anchuri 9,500 Reputation points Moderator

2025-01-20T06:20:43.8633333+00:00

Hi Tung Nguyen Xuan

We glad to hear that able to resolve the issue by Changing API version. but the justification i to explain correlation between server health and recurring rate limit. I would strongly recommend reaching the support ticket team for deep investigation. If you are seeing any discrepancies. Please create a support ticket.

Thank you.
Tung Nguyen Xuan 70 Reputation points

2025-01-20T07:55:25.9333333+00:00

@Saideep Anchuri Thanks for the clarification. I appreciate your attention to my issue.
Saideep Anchuri 9,500 Reputation points Moderator

2025-01-20T08:19:24.9266667+00:00

Hi Tung Nguyen Xuan

Thanks for your patience, from the above conversation please tell us if anything was helpful to you, so that we can convert it to answer. Then, if you could Accept Answer and Upvote it for the benefit of community, it will be helpful to others.

Thank you
Saideep Anchuri 9,500 Reputation points Moderator

2025-01-21T00:19:51.11+00:00

Hi Tung Nguyen Xuan

Following up to see if the given response was helpful.

Thank You.
Saideep Anchuri 9,500 Reputation points Moderator

2025-01-22T00:31:01.86+00:00

Hi Tung Nguyen Xuan

We haven’t heard from you on the last response and was just checking back to see if the give response was helpful.

Thank You.
Tung Nguyen Xuan 70 Reputation points

2025-01-22T13:51:38.3266667+00:00

@Saideep Anchuri today the issue with latency is back, I was having the same issue as of the user here https://learn.microsoft.com/en-us/answers/questions/2150196/high-latency-when-passing-images-to-azure-openai-g?comment=question#newest-question-comment
I tested on gpt4o eastus 2024-08-06 and gpt4o mini, both give minutes of latency when the payload contains images, eventhough the token count in total (in + out) was just 1040
Saideep Anchuri 9,500 Reputation points Moderator

2025-01-23T00:35:02.98+00:00
Hi Tung Nguyen Xuan

I'm sorry to hear you're experiencing latency issues again, it seems like this might be a known issue with Azure OpenAI, particularly when images are included in the payload.

if the issue still persists, please create a support ticket.

I recommend reporting this issue to the Azure support team. They will be able to investigate the issue further and provide a more targeted solution. You can report the issue by following these steps:

Go to the Azure portal and navigate to your OpenAI Service resource.

Click on the "Support + troubleshooting" tab.

Fill out the required information, including a detailed description of the issue and any steps you have taken to troubleshoot it.

Submit the support request.

The Azure support team will review your request and provide assistance as soon as possible Azure support.

Thank You.

Your answer

Tung Nguyen Xuan 70 Reputation points

2025-01-20T01:55:02.09+00:00

It's been 3 days, no response?
Saideep Anchuri 9,500 Reputation points Moderator

2025-01-20T04:31:29.59+00:00

Hi Tung Nguyen Xuan

Welcome to Microsoft Q&A Forum, thank you for posting your query here!

Sorry for the late response. There was no outage reported on OpenAI services in weekend. Probably, your server is going unhealthy because of long size of input queries (10k token queries). It has been observed for consistent longer queries to server brings down server health. We would strongly suggest following below solution to keep server healthy and get faster response rate.

Reduce longer queries to small clear and concise queries so that model builds context slowly and provides answer faster.

You can also reduce size of max-tokens from deployment configuration to constrain the size of generated output.

Create a new deployment in another region and follow previous steps to keep server healthy and consistent. Please reach support ticket if you want detailed RCA on this issue.

Thank You.
Tung Nguyen Xuan 70 Reputation points

2025-01-20T06:09:25.0666667+00:00

@Saideep Anchuri The issue I mentioned was not in weekend, but last Friday. Today the issue seems gone. I have never had this issue before, and I had been sending high token count requests for a long time. I strongly believe this happens due to the gpt4o version, because when I revert to 2024-08-06 the issue was gone.
Tung Nguyen Xuan 70 Reputation points

2025-01-20T06:13:24.72+00:00

@Saideep Anchuri I apologize if this comes across as cynical but in the past there were occasions when 503 errors were reported by users but the monitoring dashboard didn't reflect
Saideep Anchuri 9,500 Reputation points Moderator

2025-01-20T06:20:43.8633333+00:00

Hi Tung Nguyen Xuan

We glad to hear that able to resolve the issue by Changing API version. but the justification i to explain correlation between server health and recurring rate limit. I would strongly recommend reaching the support ticket team for deep investigation. If you are seeing any discrepancies. Please create a support ticket.

Thank you.
Tung Nguyen Xuan 70 Reputation points

2025-01-20T07:55:25.9333333+00:00

@Saideep Anchuri Thanks for the clarification. I appreciate your attention to my issue.
Saideep Anchuri 9,500 Reputation points Moderator

2025-01-20T08:19:24.9266667+00:00

Hi Tung Nguyen Xuan

Thanks for your patience, from the above conversation please tell us if anything was helpful to you, so that we can convert it to answer. Then, if you could Accept Answer and Upvote it for the benefit of community, it will be helpful to others.

Thank you
Saideep Anchuri 9,500 Reputation points Moderator

2025-01-21T00:19:51.11+00:00

Hi Tung Nguyen Xuan

Following up to see if the given response was helpful.

Thank You.
Saideep Anchuri 9,500 Reputation points Moderator

2025-01-22T00:31:01.86+00:00

Hi Tung Nguyen Xuan

We haven’t heard from you on the last response and was just checking back to see if the give response was helpful.

Thank You.
Tung Nguyen Xuan 70 Reputation points

2025-01-22T13:51:38.3266667+00:00

@Saideep Anchuri today the issue with latency is back, I was having the same issue as of the user here https://learn.microsoft.com/en-us/answers/questions/2150196/high-latency-when-passing-images-to-azure-openai-g?comment=question#newest-question-comment
I tested on gpt4o eastus 2024-08-06 and gpt4o mini, both give minutes of latency when the payload contains images, eventhough the token count in total (in + out) was just 1040
Saideep Anchuri 9,500 Reputation points Moderator

2025-01-23T00:35:02.98+00:00

Hi Tung Nguyen Xuan

I'm sorry to hear you're experiencing latency issues again, it seems like this might be a known issue with Azure OpenAI, particularly when images are included in the payload.

if the issue still persists, please create a support ticket.

I recommend reporting this issue to the Azure support team. They will be able to investigate the issue further and provide a more targeted solution. You can report the issue by following these steps:

Go to the Azure portal and navigate to your OpenAI Service resource.

Click on the "Support + troubleshooting" tab.

Fill out the required information, including a detailed description of the issue and any steps you have taken to troubleshoot it.

Submit the support request.

The Azure support team will review your request and provide assistance as soon as possible Azure support.

Thank You.

Share via

Azure OpenAI Availability rate down to 65%. 503 error

Your answer