Azure OpenAI Batch Jobs are getting stuck in Validating stage and some are failing due to enqueued token limit surpassed even though tokens are available

Seena Mary Augusty 0 Reputation points Microsoft Employee
2025-03-21T22:55:37.2933333+00:00

There were so many Azure Open AI batch jobs stuck in validating stage for last couple of days, we had faced this issue starting from March 18, 2025. After 2 days of getting stuck on validating stage, on March 20, 2024 at 4.50 pm PDT it started to work then at 7.50 pm PDT on the same day it started to get stuck in validating stage. Even after 12 hours had passed, its still stuck in validating stage.
Now the issue is ongoing at March 21, 3.54 pm PDT.

Also the batches are submitted every hour and because of this issue, some of them are failing due to enqueued token limit surpassed even though we are dynamically calculating the input tokens before we submit the batch jobs and have not reached enqueued token quota limit. Kindly take a look at this issue as well.

User's image

Here are the details of my setup:

  • Endpoint: https://live-signals-copilot.openai.azure.com/
  • Pricing Tier: Standard S0
  • Region: East US
  • Resource Configuration:
  • Name: Live-Signals-Copilot
  • Subscription: Customer Understanding
  • Subscription ID: 62e7b93a-1bbe-492b-b06a-3a93f3d269fd
    • Resource Group: Clarity

Could you help us resolve this issue as soon as possible. Our production endpoints are reliant on these batch jobs.

Azure OpenAI Service
Azure OpenAI Service
An Azure service that provides access to OpenAI’s GPT-3 models with enterprise capabilities.
3,914 questions
{count} votes

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.